scala - Automatically and Elegantly flatten DataFrame in Spark SQL

Question

Welcome To Ask or Share your Answers For Others

scala - Automatically and Elegantly flatten DataFrame in Spark SQL

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

scala - Automatically and Elegantly flatten DataFrame in Spark SQL

All,

Is there an elegant and accepted way to flatten a Spark SQL table (Parquet) with columns that are of nested StructType

For example

If my schema is:

foo
 |_bar
 |_baz
x
y
z

How do I select it into a flattened tabular form without resorting to manually running

df.select("foo.bar","foo.baz","x","y","z")

In other words, how do I obtain the result of the above code programmatically given just a StructType and a DataFrame

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-16T22:13:24+0000

The short answer is, there's no "accepted" way to do this, but you can do it very elegantly with a recursive function that generates your select(...) statement by walking through the DataFrame.schema.

The recursive function should return an Array[Column]. Every time the function hits a StructType, it would call itself and append the returned Array[Column] to its own Array[Column].

Something like:

import org.apache.spark.sql.Column
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.functions.col

def flattenSchema(schema: StructType, prefix: String = null) : Array[Column] = {
  schema.fields.flatMap(f => {
    val colName = if (prefix == null) f.name else (prefix + "." + f.name)

    f.dataType match {
      case st: StructType => flattenSchema(st, colName)
      case _ => Array(col(colName))
    }
  })
}

You would then use it like this:

df.select(flattenSchema(df.schema):_*)

Categories

scala - Automatically and Elegantly flatten DataFrame in Spark SQL

scala - Automatically and Elegantly flatten DataFrame in Spark SQL

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags