In order to keep everything on the grid use hadoop streaming with a single reducer and cat as the mapper and reducer (basically a noop) - add compression using MR flags.
hadoop jar
$HADOOP_PREFIX/share/hadoop/tools/lib/hadoop-streaming.jar <br>
-Dmapred.reduce.tasks=1
-Dmapred.job.queue.name=$QUEUE
-input "$INPUT"
-output "$OUTPUT"
-mapper cat
-reducer cat
If you want compression add
-Dmapred.output.compress=true
-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…