Skip to content

Conversation

@nvartolomei
Copy link
Contributor

@nvartolomei nvartolomei commented Jan 3, 2026

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

  • none

@nvartolomei nvartolomei force-pushed the nv/iceberg-json-schema branch 2 times, most recently from 6c57ad1 to 708ad06 Compare January 3, 2026 15:59
@nvartolomei nvartolomei force-pushed the nv/iceberg-json-schema branch from 708ad06 to 668387c Compare January 13, 2026 18:29
@nvartolomei nvartolomei changed the title iceberg/conversion: json schema improvements iceberg/conversion: refactor JSON schema to constraint-based deduction Jan 13, 2026
@nvartolomei nvartolomei force-pushed the nv/iceberg-json-schema branch 2 times, most recently from c15f9f1 to 98ea1dc Compare January 13, 2026 18:32
@nvartolomei nvartolomei marked this pull request as ready for review January 13, 2026 18:32
@nvartolomei nvartolomei requested review from andrwng, Copilot, oleiman, rockwotj and wdberkeley and removed request for oleiman January 13, 2026 18:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the JSON schema to Iceberg type conversion system from a direct conversion approach to a constraint-based deduction model. The new approach collects constraints from JSON schema (allowed types, formats, properties, items) and then resolves them to Iceberg types, enabling better handling of type unions and nullable types.

Changes:

  • Refactored the core conversion logic from direct type mapping to a two-phase collect-then-resolve model
  • Added support for nullable type unions (e.g., ["null", "integer"]) that previously failed
  • Updated error messages to be more descriptive when type constraints are ambiguous

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/rptest/tests/datalake/datalake_e2e_test.py Added test case for array items with nullable types
src/v/iceberg/conversion/tests/iceberg_json_tests.cc Updated error messages and added tests for type unions, nullable unions, and format handling
src/v/iceberg/conversion/json_schema/ir.h Added explicit enum values and validation for json_value_type to support bitset operations
src/v/iceberg/conversion/ir_json.cc Completely refactored from direct conversion to constraint-based deduction model
src/v/iceberg/conversion/BUILD Added dependency on base module for formatting utilities

Refactor the JSON Schema to Iceberg type conversion to use a two-phase
constraint solver architecture instead of the previous mixed
traversal/conversion approach.

Phase 1 (collect): Traverses the JSON Schema and builds a constraint
structure with a bitfield tracking possible types, format annotations,
and nested constraints for objects/arrays.

Phase 2 (resolve): Resolves constraints to Iceberg types, rejecting
schemas where type cannot be unambiguously determined.

This design provides a cleaner separation of concerns and is more
extensible for future JSON Schema keywords (allOf, anyOf, oneOf, $ref).

No functional changes intended.
@nvartolomei nvartolomei force-pushed the nv/iceberg-json-schema branch from 98ea1dc to 5d90791 Compare January 13, 2026 18:42
@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#78967
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
NodesDecommissioningTest test_decommission_status null integration https://buildkite.com/redpanda/redpanda/builds/78967#019bb8b6-60b7-4f1e-8b82-8972c3d1ca7e FLAKY 28/31 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0351, p0=0.2844, reject_threshold=0.0100. adj_baseline=0.1017, p1=0.3991, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodesDecommissioningTest&test_method=test_decommission_status

Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good and clean. I really like the constraint based approach!

@nvartolomei nvartolomei merged commit 69d1db4 into redpanda-data:dev Jan 14, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants