Spark scala : select column name from other dataframe

NS Saravanan Source

There are two json and first json has more column and always it is super set.

val df1 = spark.read.json(sqoopJson)
val df2 = spark.read.json(kafkaJson)

Except Operation :

I like to apply except operation on both df1 and df2, But df1 has 10 column and df2 has only 8 columns. In case manually if i drop 2 column from df1 then except will work. But I have 50+ tables/json and need to do EXCEPT for all 50 set of tables/json.

Question :

How to select only columns available in DF2 ( 8) columns from DF1 and create new df3? So df3 will have data from df1 with limited column and it will match with df2 columns.

scalahadoopapache-sparkapache-spark-sqlspark-dataframe

Answers

answered 8 months ago Shankar Koirala #1

For the Question: How to select only columns available in DF2 ( 8) columns from DF1 and create new df3?

//Get the 8 column names from df2 
val columns = df2.schema.fieldNames.map(col(_))

//select only the columns from df2 
val df3 = df1.select(columns :_*)

Hope this helps!

comments powered by Disqus