In Spark dynamic allocation spark.dynamicAllocation.enabled
needs to be set to true
because it's false
by default.
This requires spark.shuffle.service.enabled
to be set to true
, as spark application is running on YARN. Check this link to start the shuffle service on each NodeManager in YARN.
The following configurations are also relevant:
spark.dynamicAllocation.minExecutors,
spark.dynamicAllocation.maxExecutors, and
spark.dynamicAllocation.initialExecutors
These options can be configured to Spark application in 3 ways
1. From Spark submit with --conf <prop_name>=<prop_value>
spark-submit --master yarn-cluster
--driver-cores 2
--driver-memory 2G
--num-executors 10
--executor-cores 5
--executor-memory 2G
--conf spark.dynamicAllocation.minExecutors=5
--conf spark.dynamicAllocation.maxExecutors=30
--conf spark.dynamicAllocation.initialExecutors=10 # same as --num-executors 10
--class com.spark.sql.jdbc.SparkDFtoOracle2
Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar
2. Inside Spark program with SparkConf
Set the properties in SparkConf
then create SparkSession
or SparkContext
with it
val conf: SparkConf = new SparkConf()
conf.set("spark.dynamicAllocation.minExecutors", "5");
conf.set("spark.dynamicAllocation.maxExecutors", "30");
conf.set("spark.dynamicAllocation.initialExecutors", "10");
.....
3. spark-defaults.conf
usually located in $SPARK_HOME/conf/
Place the same configurations in spark-defaults.conf
to apply for all spark applications if no configuration is passed from command-line as well as code.
Spark - Dynamic Allocation Confs
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…