My approach to move a spark dataframe's columns to a nested column within the same dataframe is something like this:
.appName("SparkByExamples.com")
.master("local")
.getOrCreate()
import spark.sqlContext.implicits._
val data = Seq(("Adam", "111", "50000"),
("Abe", "222", "60000"),
("Sam", "333", "40000"))
var df = data.toDF("Name", "EmpId__c", "Salary__c")
df.show(false)
val cstColsSeq = df.columns.filter(c => c.endsWith("__c")).map(f => { col(f) }).toSeq
var cstMapCol: Column = org.apache.spark.sql.functions.struct(cstColsSeq)
df = df.withColumn("cstMap", cstMapCol)
The issue is that I can't provide a Seq[Column] to org.apache.spark.sql.functions.struct(...) call. It only accepts a Column* param.
A follow through was to do something like this:
for (i <- cstColsList) {
cstMapCol = org.apache.spark.sql.functions.struct(i)
df = df.withColumn("cstMap", cstMapCol)
}
however, this overrides the cstMap
Any thoughts how do I supply cstColsSeq to the struct ? Also open to other solutions which might take a different approach of adding nesting columns in an existing populated dataframe.
Thanks!
question from:
https://stackoverflow.com/questions/66066220/how-do-i-move-a-spark-dataframes-columns-to-a-nested-column-in-the-same-datafra 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…