pyspark - From Spark to Snowflake data types

Question

Welcome To Ask or Share your Answers For Others

pyspark - From Spark to Snowflake data types

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

pyspark - From Spark to Snowflake data types

I am new to snowflake. I'm writing a spark df to snowflake, using this code.

var = dict(sfUrl="xxxxxxxxxxxx",
        sfUser=user,
        sfPassword=password,
        sfDatabase="xxxx",
        sfSchema="xxx",
        sfWarehouse="xxxx")

df.write.format("snowflake").mode("overwrite").options(**var).option("dbtable", "xxx").save()

The df has a few StringType() columns and when I check de data types in snowflake they have VARCHAR(16777216) type. It's crazy because their length must be 2. Is there any way to specify the varchar length when I write the data? In this reference it mentions:

If length is specified, VARCHAR(N); otherwise, VARCHAR

But how can I specified length in the write command?

Thanks!

question from:https://stackoverflow.com/questions/65901227/from-spark-to-snowflake-data-types

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:15:13+0000

According to this article from Snakeflow Load Data in Spark with Overwrite mode without Changing Table Structure, you can set the 2 options usestagingtable and truncate_table respectively to OFF and ON when writing.

[...] if you write the data into this table using Snowflake Spark Connector with OVERWRITE mode, then the table gets re-created with the default length of the datatypes. It's like VARCHAR(32) will become VARCHAR(16777216).

[...] you can set the parameters TRUNCATE_TABLE=ON and USESTAGINGTABLE=OFF in the database connection string of your spark code and can run the spark data write job in "OVERWRITE" mode.

By default, the parameter USESTAGINGTABLE is set to ON because the connector writes the data frame into a temporary table, if the writing operation succeeds then the target table is being replaced by the new one.

So your code should be like this:

options = {
    "sfUrl": "xxxxxxxxxxxx",
    "sfUser": user,
    "sfPassword": password,
    "sfDatabase": "xxxx",
    "sfSchema": "xxx",
    "sfWarehouse": "xxxx",
    "truncate_table": "ON",
    "usestagingtable": "OFF"
}

df.write.format("net.snowflake.spark.snowflake") 
    .mode("overwrite") 
    .options(**options) 
    .option("dbtable", "xxx") 
    .save()

Categories

pyspark - From Spark to Snowflake data types

pyspark - From Spark to Snowflake data types

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags