Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
514 views
in Technique[技术] by (71.8m points)

scala - How to convert the dataframe column type from string to (array and struct) in spark

I have a Dataframe with the following schema, where 'name' is a string type and the value is a complex JSON with Array and struct.

Basically with string datatype i couldn't able to parse the data and write into rows. So I am trying to convert datatype and apply explode to parse the data.

Current:
root
|--id: string (nullable = true)
|--partitionNo: string (nullable = true)
|--name: string (nullable = true)

After conversion:

Expected:
root
|id: string (nullable = true)
|partitionNo: string (nullable = true)
|name: array (nullable = true)
|     |-- element: struct (containsNull = true) 
|     |    |-- extension: array (nullable = true)
|     |    |    |-- element: struct (containsNull = true)
|     |    |    |    |-- url: string (nullable = true)
|     |    |    |    |-- valueMetadata: struct (nullable = true)
|     |    |    |    |-- modifiedDateTime: string (nullable = true)
|     |    |    |    |-- code: string (nullable = true)
|     |    |-- lastName: string (nullable = true)
|     |    |-- firstName: array (nullable = true)
|     |    |    |-- element: string (containsNull = true)
question from:https://stackoverflow.com/questions/65623181/how-to-convert-the-dataframe-column-type-from-string-to-array-and-struct-in-sp

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use from_json, but you need to provide a schema, which can be automatically inferred using a spaghetti code... because from_json only accepts a schema in the form of lit:

val df2 = df.withColumn(
    "name",
    from_json(
        $"name",
        // the lines below generate the schema
        lit(
            df.select(
                schema_of_json(
                    lit(
                        df.select($"name").head()(0)
                    )
                )
            ).head()(0)
        )
        // end of schema generation
    )
)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...