TextIO.read()
and AvroIO.read()
(as well as some other Beam IO's) by default don't perform very well in current Apache Beam runners when reading a filepattern that expands into a very large number of files - for example, 1M files.
How can I read such a large number of files efficiently?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…