We all know that the block size in HDFS is pretty large (64M or 128M) as compared to the block size in traditional file systems. This is done in order to reduce the percentage of seek time compared to the transfer time (Improvements in transfer rate have been on a much larger scale than improvements on the disk seek time therefore, the goal while designing a file system is always to reduce the number of seeks in comparison to the amount of data to be transferred). But this comes with an additional disadvantage of internal fragmentation (which is why traditional file system block sizes are not so high and are only of the order of a few KBs - generally 4K or 8K).
I was going through the book - Hadoop, the Definitive Guide and found this written somewhere that a file smaller than the block size of HDFS does not occupy the full block and does not account for the full block's space but couldn't understand how? Can somebody please throw some light on this.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…