java - In Apache spark, what is the difference between using mapPartitions and combine use of broadcast variable and map

Question

Welcome To Ask or Share your Answers For Others

java - In Apache spark, what is the difference between using mapPartitions and combine use of broadcast variable and map

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

java - In Apache spark, what is the difference between using mapPartitions and combine use of broadcast variable and map

In Spark, we use broadcast variable to make each machine have read only copy of a variable. We usually create a broadcast variable outside closure (Such as a look up table needed by the closure) to improve performance.

We also have a spark transformation operator called mapPartitions, which tried to achieve the same thing (Use shared variable to improve performance). For example, in mapPartitions we can shared a database connection for each partitions.

So what's the difference between these two? Can we use it interchangebly just for shared variables?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:24:52+0000

broadcast is used to ship the object to every worker node. This object is going to be shared among all partitions on that node (and the value/i.e. object is the same for every node in the cluster). The goal of broadcasting is to save on network costs when you use the same data in many different tasks/partitions on the worker node.

mapPartitions in contrast, is a method available on RDDs, and works like map, only on partitions. Yes, you can define new objects, such as a jdbc connection, which will then be unique to each partition. However, you can't share it among different partitions, and much less among different nodes.

Categories

java - In Apache spark, what is the difference between using mapPartitions and combine use of broadcast variable and map

java - In Apache spark, what is the difference between using mapPartitions and combine use of broadcast variable and map

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags