Skip to content

Conversation

@ballard26
Copy link
Contributor

This PR fixes an issue I ran into while testing the Iceberg implementation against large messages/fields. During the test the field size exceeded the targeted page size. This resulted in most pages only having a single field. And since the full min/max bound is also encoded per page the same field was encoded 3 times per parquet file. This resulted in the file being ~3x larger than expected. The fix for this is to just truncate the bounds as allowed by parquet.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

Improvements

  • Min and max bounds in parquet statistical metadata will now be truncation to 64 bytes by default.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements bound truncation in the parquet writer to prevent excessive file size growth when field sizes exceed target page sizes. The implementation truncates min/max statistical bounds to 64 bytes by default, which addresses an issue where large fields caused files to become ~3x larger than expected due to full bounds being encoded per page.

Key Changes

  • Added UTF-8-aware bound truncation logic for binary types (strings, byte arrays)
  • Implemented reverse byte iteration support in iobuf to enable efficient bound truncation
  • Added comprehensive UTF-8 utilities for code point manipulation and validation

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/v/test_utils/go/parquet_verifier/main.go Updated verifier to validate truncated bounds and handle bound comparisons for byte arrays
src/v/strings/utf8.h Added UTF-8 validation, code point manipulation, and reverse iteration utilities
src/v/strings/tests/utf8_test.cc Added comprehensive tests for new UTF-8 utilities
src/v/strings/tests/BUILD Added new test file and bytes dependency
src/v/strings/BUILD Added boost::container dependency
src/v/serde/parquet/writer.h Added max_bound_size configuration option with 64-byte default
src/v/serde/parquet/writer.cc Passed max_bound_size to column writer
src/v/serde/parquet/tests/generate_file.cc Updated test data generation to include larger byte arrays
src/v/serde/parquet/tests/column_stats_collector_test.cc Added tests for bound truncation logic
src/v/serde/parquet/tests/BUILD Added strings:utf8 dependency
src/v/serde/parquet/encoding.h Changed encode_for_stats parameters to non-const references
src/v/serde/parquet/encoding.cc Changed to use share() instead of copy() for efficiency
src/v/serde/parquet/column_writer.h Added max_bound_size_bytes option
src/v/serde/parquet/column_writer.cc Implemented bound truncation with UTF-8 awareness for string/binary types
src/v/serde/parquet/column_stats_collector.h Added truncator template parameter and bound truncation logic
src/v/serde/parquet/column_stats_collector.cc Implemented binary_bound_truncator with UTF-8 support
src/v/serde/parquet/BUILD Added strings:utf8 dependency
src/v/bytes/tests/iobuf_tests.cc Added tests for reverse byte iteration
src/v/bytes/iobuf.h Added reverse iterator support
src/v/bytes/iobuf.cc Fixed comparison bug when right-hand side becomes empty
src/v/bytes/details/io_placeholder.h Made write() method generic for uint8_t and char
src/v/bytes/details/io_fragment.h Added const_reverse_iterator support
src/v/bytes/details/io_byte_iterator.h Implemented reverse byte iteration as io_byte_iterator_base

if (rhs.empty()) {
rhs = other_next_view();
if (o_it == o.cend()) {
if (rhs.empty()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open up a separate PR to backport this fix to older RP versions.

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 7, 2026

Retry command for Build#78626

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/datalake/delayed_translation_test.py::DatalakeDelayedTranslationTest.test_basic@{"catalog_type":"rest_jdbc","cloud_storage_type":1,"query_engine":"spark"}
tests/rptest/tests/datalake/delayed_translation_test.py::DatalakeDelayedTranslationTest.test_basic@{"catalog_type":"rest_hadoop","cloud_storage_type":1,"query_engine":"spark"}
tests/rptest/tests/datalake/delayed_translation_test.py::DatalakeDelayedTranslationTest.test_basic@{"catalog_type":"nessie","cloud_storage_type":1,"query_engine":"spark"}
tests/rptest/tests/datalake/delayed_translation_test.py::DatalakeDelayedTranslationTest.test_basic@{"catalog_type":"nessie","cloud_storage_type":1,"query_engine":"trino"}
tests/rptest/tests/datalake/delayed_translation_test.py::DatalakeDelayedTranslationTest.test_basic@{"catalog_type":"rest_jdbc","cloud_storage_type":1,"query_engine":"trino"}
tests/rptest/tests/datalake/delayed_translation_test.py::DatalakeDelayedTranslationTest.test_basic@{"catalog_type":"rest_hadoop","cloud_storage_type":1,"query_engine":"trino"}

@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#78626
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
DatalakeDelayedTranslationTest test_basic {"catalog_type": "nessie", "cloud_storage_type": 1, "query_engine": "spark"} integration https://buildkite.com/redpanda/redpanda/builds/78626#019b96fd-21ee-41c5-afdc-7b80528dca11 FAIL 0/1 https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DatalakeDelayedTranslationTest&test_method=test_basic
DatalakeDelayedTranslationTest test_basic {"catalog_type": "rest_hadoop", "cloud_storage_type": 1, "query_engine": "spark"} integration https://buildkite.com/redpanda/redpanda/builds/78626#019b96fd-21ef-45d5-b7da-097cc933837f FAIL 0/1 https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DatalakeDelayedTranslationTest&test_method=test_basic
DatalakeDelayedTranslationTest test_basic {"catalog_type": "rest_jdbc", "cloud_storage_type": 1, "query_engine": "spark"} integration https://buildkite.com/redpanda/redpanda/builds/78626#019b96fd-21e6-4ed6-bf00-3f8e3f36f896 FAIL 0/1 https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DatalakeDelayedTranslationTest&test_method=test_basic
DatalakeDelayedTranslationTest test_basic {"catalog_type": "nessie", "cloud_storage_type": 1, "query_engine": "trino"} integration https://buildkite.com/redpanda/redpanda/builds/78626#019b96fd-21e7-4b22-a37f-fe211106e502 FAIL 0/1 https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DatalakeDelayedTranslationTest&test_method=test_basic
DatalakeDelayedTranslationTest test_basic {"catalog_type": "rest_hadoop", "cloud_storage_type": 1, "query_engine": "trino"} integration https://buildkite.com/redpanda/redpanda/builds/78626#019b96fd-21e8-41c1-b635-4943b3450158 FAIL 0/1 https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DatalakeDelayedTranslationTest&test_method=test_basic
DatalakeDelayedTranslationTest test_basic {"catalog_type": "rest_jdbc", "cloud_storage_type": 1, "query_engine": "trino"} integration https://buildkite.com/redpanda/redpanda/builds/78626#019b96fd-21e9-4e41-b4a7-a14a55126b26 FAIL 0/1 https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DatalakeDelayedTranslationTest&test_method=test_basic
MountUnmountIcebergTest test_simple_remount {"cloud_storage_type": 1} integration https://buildkite.com/redpanda/redpanda/builds/78626#019b9702-820b-4d74-ad70-3a25831b2ce4 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.2159, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.5179, p1=0.0007, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=MountUnmountIcebergTest&test_method=test_simple_remount
PartitionBalancerTest test_rack_awareness null integration https://buildkite.com/redpanda/redpanda/builds/78626#019b9702-820f-4a92-8f01-f2239a9d7fa1 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0130, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=PartitionBalancerTest&test_method=test_rack_awareness

[[gnu::always_inline]] void write(const char* src, size_t len) {
template<typename T>
[[gnu::always_inline]] void write(const T* src, size_t len)
requires(sizeof(T) == 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs a doc now that it's not entirely obvious what it is doing.

It will do an element-wise copy from the src array to this placeholder, if T can be assigned to a char, right?

@travisdowns
Copy link
Member

Except for really trivial commits, commit messages should have a body as well, e.g., why this is being added, etc.

// handle an empty fragment
if (_frag_index == _frag_index_end) {
continue;
if constexpr (Forward) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of logic should be DRY with the identical block of logic in the ctor, probably called "maybe_next_fragment", called in the ctor and after every increment/decrement type operation.

}
} else {
// NOLINTNEXTLINE(cppcoreguidelines-pro-bounds-pointer-arithmetic)
if (_frag_index-- == _frag_index_end) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will form a "one before the start" pointer (when the condition is true), which is not allowed (yes, it's quite annoying).


#pragma once

#include "bytes/details/io_fragment.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iobuf_fuzz should probably be augemnted with reverse iterator cases

if ok != !(colMin.IsNull() && colMax.IsNull()) {
log.Fatalf(
"❌ : missing column bounds for row group %d column %d (%d, %d)\n",
i, j, col.NumValues(), col.NullCount(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is allowed in the parquet format? I.e., it is not required that the min/max actually exist in the page, just that they are bounds?

@dotnwat dotnwat requested a review from andrwng January 8, 2026 01:21
@ballard26
Copy link
Contributor Author

After discussing this PR with @travisdowns I've decided to just copy the truncated bounds to a std::string then do all utf-8 manipulations on that. This should be fine as it's only 64 bytes and it means that all iobuf changes can be dropped.

These changes should be up tomorrow. Switching the PR back to draft until they are up.

@ballard26 ballard26 marked this pull request as draft January 8, 2026 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants