iceberg: fix spec inconsistency in manifest list files_count #24602

andrwng · 2024-12-18T00:08:20Z

The schema we are using was pulled some time ago and appears to be outdated. The Apache Iceberg Java implementation has since renamed added_data_files_count and friends to added_files_count, to match the documented spec.

This meant that after updating the table with an external non-Redpanda writer, Redpanda wouldn't be able to download the current manifest list when appending and get stuck, complaining about an EOF (presumably the Avro C++ library throws this when there is an unknown field).

I suspect that this may have been the cause of an EOF seen when trying to read a manifest list with BigQuery:

Error while reading data, error message: The Apache Avro failed to read data with the following error: EOF reached File: [...]/metadata/snap-[...]-0.avro

The old names are added as an alias to ensure Redpanda can still download Iceberg manifest lists from 24.3. This PR also adds an upgrade test to validate that this alias is actually required, rather than simply renaming the field. Without the alias, the upgrade test fails.

Backports Required

Release Notes

Bug Fixes

Fixes a bug in Redpanda's Iceberg manifest list Avro definition that previously resulted in an end-of-file (EOF) error when reading manifest list Avro files written by other engines. This could previously block Redpanda appending Iceberg data, and could also prevent certain query engines from successfully reading Iceberg data written by Redpanda.

The schema we are using was pulled some time ago and appears to be outdated. The Apache Iceberg Java implementation has since renamed added_data_files_count and friends to added_files_count, to match the documented spec. This meant that after updating the table with an external non-Redpanda writer, Redpanda wouldn't be able to download the current manifest list when appending and get stuck, complaining about an EOF (presumably the Avro C++ library throws this when there is an unknown field). I suspect that this may have also been the cause of an EOF seen when trying to read a manifest list with BigQuery: Error while reading data, error message: The Apache Avro failed to read data with the following error: EOF reached File: [...]/metadata/snap-[...]-0.avro The old names are added as an alias to ensure Redpanda can still download Iceberg manifest lists from 24.3.

Adds a simple upgrade test to go from 24.3 to HEAD, ensuring progress can be made. A recent change to our Iceberg manifest list schema changed the name of one field, adding the old name as an alias. Without that aliasing, this new test would fail.

vbotbuildovich · 2024-12-18T05:41:15Z

Retry command for Build#59894

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/datalake/partition_movement_test.py::PartitionMovementTest.test_cross_core_movements@{"cloud_storage_type":1}
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":20}

vbotbuildovich · 2024-12-18T05:59:37Z

CI test results

test results on build#59894

test_id	test_kind	job_url	test_status	passed
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=20.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59894#0193d7dc-498b-46b5-9e78-c05a95af3147	FAIL	0/6
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/59894#0193d7dc-4989-42dd-ad69-30935788b832	FAIL	0/6

ztlpn · 2024-12-18T12:56:01Z

Hmm I wonder if we should backport this. Technically this is a breaking change that will prevent downgrades to earlier 24.3 versions. Or is backward compat also working thanks to aliases?

andrwng · 2024-12-18T17:22:45Z

Hmm I wonder if we should backport this. Technically this is a breaking change that will prevent downgrades to earlier 24.3 versions. Or is backward compat also working thanks to aliases?

Yeah good thought. You're right, this will prevent Iceberg from working following a rollback, even with the alias, because 24.3.prev won't be able to read the manifest lists with the new Avro schema.

That said, I lean towards it being justifiable -- it's a correctness fix, especially for a beta feature.

ztlpn · 2024-12-23T13:04:04Z

tests/rptest/tests/datalake/datalake_e2e_test.py

@@ -134,6 +134,31 @@ def test_avro_schema(self, cloud_storage_type, query_engine):
                assert spark_describe_out == spark_expected_out, str(
                    spark_describe_out)

+    @cluster(num_nodes=4)
+    @matrix(cloud_storage_type=[CloudStorageType.S3])


nit: will the test fail or be skipped when running on e.g. azure?

github-actions bot added the area/redpanda label Dec 18, 2024

andrwng force-pushed the datalake-manifest-list-fix branch 2 times, most recently from b89b3ed to 68397b0 Compare December 18, 2024 01:55

andrwng added 2 commits December 17, 2024 17:57

rptest: datalake upgrade test

fe6155c

Adds a simple upgrade test to go from 24.3 to HEAD, ensuring progress can be made. A recent change to our Iceberg manifest list schema changed the name of one field, adding the old name as an alias. Without that aliasing, this new test would fail.

andrwng force-pushed the datalake-manifest-list-fix branch from 68397b0 to fe6155c Compare December 18, 2024 01:58

andrwng changed the title ~~iceberg: fix inconsistency in manifest list files_count~~ iceberg: fix spec inconsistency in manifest list files_count Dec 18, 2024

andrwng added this to the v24.3.x-next milestone Dec 18, 2024

andrwng requested review from ztlpn, bharathv and dotnwat December 18, 2024 02:23

ztlpn approved these changes Dec 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iceberg: fix spec inconsistency in manifest list files_count #24602

iceberg: fix spec inconsistency in manifest list files_count #24602

andrwng commented Dec 18, 2024 •

edited

Loading

vbotbuildovich commented Dec 18, 2024 •

edited

Loading

vbotbuildovich commented Dec 18, 2024

ztlpn commented Dec 18, 2024 •

edited

Loading

andrwng commented Dec 18, 2024

ztlpn Dec 23, 2024

iceberg: fix spec inconsistency in manifest list files_count #24602

Are you sure you want to change the base?

iceberg: fix spec inconsistency in manifest list files_count #24602

Conversation

andrwng commented Dec 18, 2024 • edited Loading

Backports Required

Release Notes

Bug Fixes

vbotbuildovich commented Dec 18, 2024 • edited Loading

Retry command for Build#59894

vbotbuildovich commented Dec 18, 2024

CI test results

ztlpn commented Dec 18, 2024 • edited Loading

andrwng commented Dec 18, 2024

ztlpn Dec 23, 2024

Choose a reason for hiding this comment

andrwng commented Dec 18, 2024 •

edited

Loading

vbotbuildovich commented Dec 18, 2024 •

edited

Loading

ztlpn commented Dec 18, 2024 •

edited

Loading