apache spark - What are broadcast variables? What problems do they solve?

Question

Welcome To Ask or Share your Answers For Others

apache spark - What are broadcast variables? What problems do they solve?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

apache spark - What are broadcast variables? What problems do they solve?

I am going through Spark Programming guide that says:

Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.

Considering the above, what are the use cases of broadcast variables? What problems do broadcast variables solve?

When we create any broadcast variable like below, the variable reference, here it is broadcastVar available in all the nodes in the cluster?

val broadcastVar = sc.broadcast(Array(1, 2, 3))

How long these variables available in the memory of the nodes?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T01:04:07+0000

If you have huge array that is accessed from Spark Closures, for example some reference data, this array will be shipped to each spark node with closure. For example if you have 10 nodes cluster with 100 partitions (10 partitions per node), this Array will be distributed at least 100 times (10 times to each node).

If you use broadcast it will be distributed once per node using efficient p2p protocol.

val array: Array[Int] = ??? // some huge array
val broadcasted = sc.broadcast(array)

And some RDD

val rdd: RDD[Int] = ???

In this case array will be shipped with closure each time

rdd.map(i => array.contains(i))

and with broadcast you'll get huge performance benefit

rdd.map(i => broadcasted.value.contains(i))

Categories

apache spark - What are broadcast variables? What problems do they solve?

apache spark - What are broadcast variables? What problems do they solve?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags