Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.0k views
in Technique[技术] by (71.8m points)

dataframe - Pyspark: Serialized task exceeds max allowed. Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values

I'm doing calculations on a cluster and at the end when I ask summary statistics on my Spark dataframe with df.describe().show() I get an error:

Serialized task 15:0 was 137500581 bytes, which exceeds max allowed: spark.rpc.message.maxSize (134217728 bytes). Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values

In my Spark configuration I already tried to increase the aforementioned parameter:

spark = (SparkSession
         .builder
         .appName("TV segmentation - dataprep for scoring")
         .config("spark.executor.memory", "25G")
         .config("spark.driver.memory", "40G")
         .config("spark.dynamicAllocation.enabled", "true")
         .config("spark.dynamicAllocation.maxExecutors", "12")
         .config("spark.driver.maxResultSize", "3g")
         .config("spark.kryoserializer.buffer.max.mb", "2047mb")
         .config("spark.rpc.message.maxSize", "1000mb")
         .getOrCreate())

I also tried to repartition my dataframe using:

dfscoring=dfscoring.repartition(100)

but still I keep on getting the same error.

My environment: Python 3.5, Anaconda 5.0, Spark 2

How can I avoid this error ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

i'm in same trouble, then i solve it. the cause is spark.rpc.message.maxSize if default set 128M, you can change it when launch a spark client, i'm work in pyspark and set the value to 1024, so i write like this:

pyspark --master yarn --conf spark.rpc.message.maxSize=1024

solve it.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...