apache spark - How to force inferSchema for CSV to consider integers as dates (with "dateFormat" option)?

Question

Welcome To Ask or Share your Answers For Others

apache spark - How to force inferSchema for CSV to consider integers as dates (with "dateFormat" option)?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

apache spark - How to force inferSchema for CSV to consider integers as dates (with "dateFormat" option)?

I use Spark 2.2.0

I am reading a csv file as follows:

val dataFrame = spark.read.option("inferSchema", "true")
                          .option("header", true)
                          .option("dateFormat", "yyyyMMdd")
                          .csv(pathToCSVFile)

There is one date column in this file, and all records has a value equal to 20171001 for this particular column.

The issue is that spark is inferring that that the type of this column is integer rather than date. When I remove the "inferSchema" option, the type of that column is string.

There is no null values, nor any wrongly formatted line in this file.

What is the reason/solution for this issue?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T02:58:06+0000

If my understanding is correct, the code implies the following order of type inference (with the first types being checked against first):

NullType
IntegerType
LongType
DecimalType
DoubleType
TimestampType
BooleanType
StringType

With that, I think the issue is that 20171001 matches IntegerType before even considering TimestampType (which uses timestampFormat not dateFormat option).

One solution would be to define the schema and use it with schema operator (of DataFrameReader) or let Spark SQL infer the schema and use cast operator.

I'd choose the former if the number of fields is not high.

Categories

apache spark - How to force inferSchema for CSV to consider integers as dates (with "dateFormat" option)?

apache spark - How to force inferSchema for CSV to consider integers as dates (with "dateFormat" option)?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags