hadoop - How to execute Spark programs with Dynamic Resource Allocation?

Question

Welcome To Ask or Share your Answers For Others

hadoop - How to execute Spark programs with Dynamic Resource Allocation?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

hadoop - How to execute Spark programs with Dynamic Resource Allocation?

I am using spark-summit command for executing Spark jobs with parameters such as:

spark-submit --master yarn-cluster --driver-cores 2 
 --driver-memory 2G --num-executors 10 
 --executor-cores 5 --executor-memory 2G 
 --class com.spark.sql.jdbc.SparkDFtoOracle2 
 Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

Now i want to execute the same program using Spark's Dynamic Resource allocation. Could you please help with the usage of Dynamic Resource Allocation in executing Spark programs.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:29:33+0000

In Spark dynamic allocation spark.dynamicAllocation.enabled needs to be set to true because it's false by default.

This requires spark.shuffle.service.enabled to be set to true, as spark application is running on YARN. Check this link to start the shuffle service on each NodeManager in YARN.

The following configurations are also relevant:

spark.dynamicAllocation.minExecutors, 
spark.dynamicAllocation.maxExecutors, and 
spark.dynamicAllocation.initialExecutors

These options can be configured to Spark application in 3 ways

1. From Spark submit with --conf <prop_name>=<prop_value>

spark-submit --master yarn-cluster 
    --driver-cores 2 
    --driver-memory 2G 
    --num-executors 10 
    --executor-cores 5 
    --executor-memory 2G 
    --conf spark.dynamicAllocation.minExecutors=5 
    --conf spark.dynamicAllocation.maxExecutors=30 
    --conf spark.dynamicAllocation.initialExecutors=10  # same as --num-executors 10
    --class com.spark.sql.jdbc.SparkDFtoOracle2 
    Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

2. Inside Spark program with SparkConf

Set the properties in SparkConf then create SparkSession or SparkContext with it

val conf: SparkConf = new SparkConf()
conf.set("spark.dynamicAllocation.minExecutors", "5");
conf.set("spark.dynamicAllocation.maxExecutors", "30");
conf.set("spark.dynamicAllocation.initialExecutors", "10");
.....

3. spark-defaults.conf usually located in $SPARK_HOME/conf/

Place the same configurations in spark-defaults.conf to apply for all spark applications if no configuration is passed from command-line as well as code.

Spark - Dynamic Allocation Confs

Categories

hadoop - How to execute Spark programs with Dynamic Resource Allocation?

hadoop - How to execute Spark programs with Dynamic Resource Allocation?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags