What does the distribute cache actually mean? Having a file in distributed cache means that is it available in every datanode and hence there will be no internode communication for that data, or does it mean that the file is in memory in every node?
If not, by what means can I have a file in memory for the entire job? Can this be done both for map-reduce, as well as for a UDF..
(In particular there is some configuration data, comparatively small that I would like to keep in memory as a UDF applies on hive query...? )
Thanks and regards,
Dhruv Kapur.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…