Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.5k views
in Technique[技术] by (71.8m points)

pyspark - to_date fails to parse date in Spark 3.0

I am trying to parse date using to_date() but I get the following exception.

SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '12/1/2010 8:26' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.

The exception suggests I should use a legacy Time Parser, for starter I don't know how to set it to Legacy.

Here is my implementation

dfWithDate = df.withColumn("date", to_date(col("InvoiceDate"), "MM/dd/yyyy"))

my date is in following format

+--------------+
|   InvoiceDate|
+--------------+
|12/1/2010 8:26|
|12/1/2010 8:26|
|12/1/2010 8:26|
|12/1/2010 8:26|
|12/1/2010 8:26|
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
df.withColumn("date", to_date(col("InvoiceDate"), "MM/dd/yyyy")).show()


+--------------+----------+
|   InvoiceDate|      date|
+--------------+----------+
|12/1/2010 8:26|2010-12-01|
+--------------+----------+

# in above code spark refers SparkSession

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...