hadoop - Manually splitting and compressing input for Amazon EMR -
instead of using hadoop-lzo
index lzo input file, decided split chunks, compressed lzo close 128mb (since default block size on amazon distribution[1]).
is there wrong (from cluster performance perspective) provide input split , compressed size close default hdfs block size?
Comments
Post a Comment