scala - How do I move a spark dataframe's columns to a nested column in the same dataframe?

Question

Welcome To Ask or Share your Answers For Others

scala - How do I move a spark dataframe's columns to a nested column in the same dataframe?

asked Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

scala - How do I move a spark dataframe's columns to a nested column in the same dataframe?

My approach to move a spark dataframe's columns to a nested column within the same dataframe is something like this:

    .appName("SparkByExamples.com")
    .master("local")
    .getOrCreate()

  import spark.sqlContext.implicits._

  val data = Seq(("Adam", "111", "50000"),
                 ("Abe", "222", "60000"),
                 ("Sam", "333", "40000"))

  var df = data.toDF("Name", "EmpId__c", "Salary__c")
  df.show(false)

  val cstColsSeq = df.columns.filter(c => c.endsWith("__c")).map(f => { col(f) }).toSeq
  var cstMapCol: Column = org.apache.spark.sql.functions.struct(cstColsSeq)
  df = df.withColumn("cstMap", cstMapCol)

The issue is that I can't provide a Seq[Column] to org.apache.spark.sql.functions.struct(...) call. It only accepts a Column* param.

A follow through was to do something like this:

for (i <- cstColsList) {
    cstMapCol = org.apache.spark.sql.functions.struct(i)
    df = df.withColumn("cstMap", cstMapCol)
}

however, this overrides the cstMap

Any thoughts how do I supply cstColsSeq to the struct ? Also open to other solutions which might take a different approach of adding nesting columns in an existing populated dataframe.

Thanks!

question from:https://stackoverflow.com/questions/66066220/how-do-i-move-a-spark-dataframes-columns-to-a-nested-column-in-the-same-datafra

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T03:02:26+0000

You can expand the Seq using the : _* notation:

var cstMapCol: Column = org.apache.spark.sql.functions.struct(cstColsSeq: _*)

which will give the output

df.withColumn("cstMap", cstMapCol).show
+----+--------+---------+------------+
|Name|EmpId__c|Salary__c|      cstMap|
+----+--------+---------+------------+
|Adam|     111|    50000|[111, 50000]|
| Abe|     222|    60000|[222, 60000]|
| Sam|     333|    40000|[333, 40000]|
+----+--------+---------+------------+

Categories

scala - How do I move a spark dataframe's columns to a nested column in the same dataframe?

scala - How do I move a spark dataframe's columns to a nested column in the same dataframe?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags