There are two json and first json has more column and always it is super set.
val df1 = spark.read.json(sqoopJson) val df2 = spark.read.json(kafkaJson)
Except Operation :
I like to apply except operation on both df1 and df2, But df1 has 10 column and df2 has only 8 columns. In case manually if i drop 2 column from df1 then except will work. But I have 50+ tables/json and need to do EXCEPT for all 50 set of tables/json.
How to select only columns available in DF2 ( 8) columns from DF1 and create new df3? So df3 will have data from df1 with limited column and it will match with df2 columns.scalahadoopapache-sparkapache-spark-sqlspark-dataframe
For the Question: How to select only columns available in DF2 ( 8) columns from DF1 and create new df3?
//Get the 8 column names from df2 val columns = df2.schema.fieldNames.map(col(_)) //select only the columns from df2 val df3 = df1.select(columns :_*)
Hope this helps!