Welcome To Ask or Share your Answers For Others

hadoop - Behavior of the parameter "mapred.min.split.size" in HDFS

Welcome To Ask or Share your Answers For Others

1 Answer

answered Oct 17, 2021 by 深蓝 (71.8m points)

The split size is calculated by the formula:-

max(mapred.min.split.size, min(mapred.max.split.size, dfs.block.size))

In your case it will be:-

split size=max(128,min(Long.MAX_VALUE(default),64))

So above inference:-

each map will process 2 hdfs blocks(assuming each block 64MB): True
There will be a new division of my input file (previously included HDFS) to occupy blocks in HDFS 128M: False

but making the minimum split size greater than the block size increases the split size, but at the cost of locality.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

...