-
Notifications
You must be signed in to change notification settings - Fork 709
iceberg/conversion: refactor JSON schema to constraint-based deduction #29139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iceberg/conversion: refactor JSON schema to constraint-based deduction #29139
Conversation
6c57ad1 to
708ad06
Compare
708ad06 to
668387c
Compare
c15f9f1 to
98ea1dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the JSON schema to Iceberg type conversion system from a direct conversion approach to a constraint-based deduction model. The new approach collects constraints from JSON schema (allowed types, formats, properties, items) and then resolves them to Iceberg types, enabling better handling of type unions and nullable types.
Changes:
- Refactored the core conversion logic from direct type mapping to a two-phase collect-then-resolve model
- Added support for nullable type unions (e.g.,
["null", "integer"]) that previously failed - Updated error messages to be more descriptive when type constraints are ambiguous
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
tests/rptest/tests/datalake/datalake_e2e_test.py |
Added test case for array items with nullable types |
src/v/iceberg/conversion/tests/iceberg_json_tests.cc |
Updated error messages and added tests for type unions, nullable unions, and format handling |
src/v/iceberg/conversion/json_schema/ir.h |
Added explicit enum values and validation for json_value_type to support bitset operations |
src/v/iceberg/conversion/ir_json.cc |
Completely refactored from direct conversion to constraint-based deduction model |
src/v/iceberg/conversion/BUILD |
Added dependency on base module for formatting utilities |
Refactor the JSON Schema to Iceberg type conversion to use a two-phase constraint solver architecture instead of the previous mixed traversal/conversion approach. Phase 1 (collect): Traverses the JSON Schema and builds a constraint structure with a bitfield tracking possible types, format annotations, and nested constraints for objects/arrays. Phase 2 (resolve): Resolves constraints to Iceberg types, rejecting schemas where type cannot be unambiguously determined. This design provides a cleaner separation of concerns and is more extensible for future JSON Schema keywords (allOf, anyOf, oneOf, $ref). No functional changes intended.
98ea1dc to
5d90791
Compare
CI test resultstest results on build#78967
|
rockwotj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good and clean. I really like the constraint based approach!
Backports Required
Release Notes