Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
129 views
in Technique[技术] by (71.8m points)

python - Not able Convert String to Date in Spark.Sql, getting error

I am not able to convert string to date format in spark.sql. When I am passing the raw string , it is converted successfully but when I am trying to store that value in a variable and passing that argument , getting type mismatch error. Tried many different techniques but still getting the same error. Can someone please help me with this thing :

>>> s
'2020-10-23'
>>> type(s)
<type 'str'>
>>> spark.sql("""select cast('2020-10-23' as date)""").show()
+------------------------+                                                      
|CAST(2020-10-23 AS DATE)|
+------------------------+
|              2020-10-23|
+------------------------+

>>> spark.sql("""select cast("""+s+""" as date)""").show()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/pyspark/sql/session.py", line 778, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/pyspark/sql/utils.py", line 69, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"cannot resolve 'CAST(((2020 - 10) - 23) AS DATE)' due to data type mismatch: cannot cast int to date; line 1 pos 7;
'Project [unresolvedalias(cast(((2020 - 10) - 23) as date), None)]
+- OneRowRelation
"


>>> spark.sql("""select cast("""+str(s)+""" as date)""").show()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/pyspark/sql/session.py", line 778, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/pyspark/sql/utils.py", line 69, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"cannot resolve 'CAST(((2020 - 10) - 23) AS DATE)' due to data type mismatch: cannot cast int to date; line 1 pos 7;
'Project [unresolvedalias(cast(((2020 - 10) - 23) as date), None)]
+- OneRowRelation
"
>>> 
>>> s
'2020-10-23'
>>> type(s)
<type 'str'>
>>> 
>>> spark.sql("""select cast(date_format("""+s+""",'yyyy-MM-dd') as date)""").show()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/pyspark/sql/session.py", line 778, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/pyspark/sql/utils.py", line 69, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"cannot resolve 'date_format(((2020 - 10) - 23), 'yyyy-MM-dd')' due to data type mismatch: argument 1 requires timestamp type, however, '((2020 - 10) - 23)' is of int type.; line 1 pos 12;
'Project [unresolvedalias(cast(date_format(((2020 - 10) - 23), yyyy-MM-dd, Some(America/New_York)) as date), None)]
+- OneRowRelation
"
>>> 
question from:https://stackoverflow.com/questions/65867764/not-able-convert-string-to-date-in-spark-sql-getting-error

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You're missing single quotes that surround s:

spark.sql("""select cast('"""+s+"""' as date)""")

To convince yourself that the single quotes are needed, you can print your query

print("""select cast("""+s+""" as date)""")

And you will see

select cast(2020-10-23 as date)

where the single quotes are missing.


Alternatively, you can define s as a string containing single quotes:

s = "'2020-10-23'"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...