Walk on the web flow

the goal is to leave some links that seem interesting to me (but also to share them)

Home TagCloud All Posts
View on GitHub
17 April 2018

Zoom on Apache Parquet

by snonov

Zoom on Apache Parquet

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of

Parquet inspiration

Built on Twitter and Cloudera collaboration, first version Apache parquet 1.0 was released on july 2013.

Inspired from Dremel google paper

Main references urls

Cloudera supports some but not all of the object models from the upstream Parquet-MR project

The Impala and Hive object models that are built into those components, not available in external libraries (built in support)

Ecosystem integration

Data serialization libraries

But also integrate with Hadoop ecosystem (MapReduce, Pig, Hive and Impala)

Resources

Projects

Articles

tags: Zoomon - Hadoop - Apache - Parquet - Avro - Impala - Twitter - Cloudera