代码拉取完成,页面将自动刷新
Hadoop MapReduce program to convert Avro data files to Parquet format.
git clone https://github.com/laserson/avro2parquet.git
cd avro2parquet
mvn clean package
This will generate the jar
files in the target/
directory.
This tool will work on Avro container files (which I believe is just the
standard Avro data file format). It contains the Avro GenericRecord
objects
as the key and a NullWritable
as the value.
The tool is currently hardcoded to output Snappy-compressed Parquet. It is
simply a MapReduce job using the Tool
interface.
The command is like so:
hadoop jar <avro2parquet jar file> \
com.cloudera.science.avro2parquet.Avro2Parquet \
<and generic options to the JVM> \
hdfs:///path/to/avro/schema.avsc \
hdfs:///path/to/avro/data \
hdfs:///output/path
so for example:
hadoop jar avro2parquet-0.1.0-jar-with-dependencies.jar \
com.cloudera.science.avro2parquet.Avro2Parquet \
-D mapred.child.java.opts=-Xmx1024M \
hdfs:///user/lasersou/schemas/data.avsc \
hdfs:///user/lasersou/data \
hdfs:///user/lasersou/output
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。