python - Spark 1.4 increase maxResultSize memory

Question

Welcome To Ask or Share your Answers For Others

python - Spark 1.4 increase maxResultSize memory

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Spark 1.4 increase maxResultSize memory

I am using Spark 1.4 for my research and struggling with the memory settings. My machine has 16GB of memory so no problem there since the size of my file is only 300MB. Although, when I try to convert Spark RDD to panda dataframe using toPandas() function I receive the following error:

serialized results of 9 tasks (1096.9 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)

I tried to fix this changing the spark-config file and still getting the same error. I've heard that this is a problem with spark 1.4 and wondering if you know how to solve this. Any help is much appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:42:31+0000

You can set spark.driver.maxResultSize parameter in the SparkConf object:

from pyspark import SparkConf, SparkContext

# In Jupyter you have to stop the current context first
sc.stop()

# Create new config
conf = (SparkConf()
    .set("spark.driver.maxResultSize", "2g"))

# Create new context
sc = SparkContext(conf=conf)

You should probably create a new SQLContext as well:

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

Categories

python - Spark 1.4 increase maxResultSize memory

python - Spark 1.4 increase maxResultSize memory

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags