i can see in my mapreduce jobs that the output of the reducer part is sorted by key ..
so if i have set number of reducers to 10, the output directory would contain 10 files and each of those output files have a sorted data.
the reason i am putting it here is that even though all the files have sorted data but these files itself are not sorted..
for example : there are scenarios where the part-000* files have started from 0 and end at zzzz assuming i am using Text as the key.
i was assumming that the file's should be sorted even within the files i.e file 1 should have a and the last file part--00009 should have entries with zzzz or atleaset > a
assuming if i have all the alphabets uniformally distributed keys.
could someone throw some light why such a behavior
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…