support different encodings in `parquet` encode processor #38

ryanworl · 2024-06-09T20:38:18Z

Relevant docs: https://warpstreamlabs.github.io/bento/docs/components/processors/parquet_encode/

Today Bento only supports PLAIN and DELTA_LENGTH_BYTE_ARRAY encodings, and you can only use a single encoding for all data types.

We should instead support configuring a different encoding for each field.

For example, in a use case where you are transforming a Kafka topic into a set of Parquet files, you would want to use RLE_DICTIONARY for the topic names and partitions. You'd probably also want to use DELTA_BINARY_PACKED for the offsets and record timestamps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support different encodings in `parquet` encode processor #38

support different encodings in `parquet` encode processor #38

ryanworl commented Jun 9, 2024

support different encodings in parquet encode processor #38

support different encodings in parquet encode processor #38

Comments

ryanworl commented Jun 9, 2024

support different encodings in `parquet` encode processor #38

support different encodings in `parquet` encode processor #38