Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
376 views
in Technique[技术] by (71.8m points)

broadcast - When does Spark evict broadcasted dataframe from Executors?

I have a doubt, about when we broadcast a dataframe.

Copies of broadcasted dataframe are sent to each Executor.

So, when does Spark evict these copies from each Executor ?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I find this topic functionally easy to understand, but the manuals harder to follow technically and there are improvements always in the offing.

My take:

  • There is a ContextCleaner that is running on the Driver for every Spark App.
  • It gets created immediately started when the SparkContext commences.
  • It is more about all sorts of objects in Spark.
  • The ContextCleaner thread cleans RDD, shuffle, and broadcast states, Accumulators using keepCleaning method that runs always from this class. It decides which objects needs eviction due to no longer being referenced and these get placed on a list. It calls various methods, such as registerShuffleForCleanup. That is to say a check is made to see if there are no alive root objects pointing to a given object; if so, then that object is eligible for clean-up, eviction.
  • context-cleaner-periodic-gc asynchronously requests the standard JVM garbage collector. Periodic runs of this are started when ContextCleaner starts and stopped when ContextCleaner terminates.
  • Spark makes use of the standard Java GC.

This https://mallikarjuna_g.gitbooks.io/spark/content/spark-service-contextcleaner.html is a good reference next to the Spark official docs.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...