Parquet data store setup in minutes

Setup the Matatika platform to deliver and process your data in Parquet in minutes.

Learn more Schedule a demo

Automate Parquet from a single space with no code

Parquet is a columnar storage format for Hadoop.

Parquet is a software tool that provides a columnar storage format for Hadoop, allowing for efficient and optimized processing of large datasets. It is designed to work with a variety of data processing frameworks, including Apache Spark, Apache Hive, and Apache Impala, and supports a wide range of data types and compression algorithms. Parquet is particularly useful for data analytics and business intelligence applications, as it enables fast and efficient querying of large datasets, while minimizing storage and processing costs.

Settings

Disable Collection

A setting to disable the collection of statistics during the Parquet API connection.

Logging Level

A setting to specify the level of logging for the Parquet API connection.

Destination Path

The path where the Parquet files will be saved.

Compression Method

The compression method to be used for the Parquet files.

Streams In Separate Folder

A setting to specify whether the streams should be saved in a separate folder.

File Size

The maximum size of each Parquet file.

View source code

Parquet

Collect and process data from 100s of sources and tools with Parquet.

Interested in learning more?