-
Notifications
You must be signed in to change notification settings - Fork 2.4k
feat(schema): phase 5 - Perform Java Client Core Migration #14340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
|
c35119a to
a47d4ed
Compare
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java
Outdated
Show resolved
Hide resolved
3d9e761 to
e987dbd
Compare
a8998d9 to
1dad9ee
Compare
hudi-common/src/main/java/org/apache/hudi/common/engine/RecordContext.java
Outdated
Show resolved
Hide resolved
8ff89c2 to
6d27465
Compare
|
Okay, should be ready for review now. |
the-other-tim-brown
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed some of the files but the scope of this PR is much larger than the ticket calls for. Is that intentional or is this combined with another branch?
...nt/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseWriteHelper.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteMergeHandle.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java
Outdated
Show resolved
Hide resolved
…between HoodieSchema and Avro.Schema
758cfbc to
8e39e79
Compare
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HiveHoodieReaderContext.java
Outdated
Show resolved
Hide resolved
| HoodieSchema longSchema = HoodieSchema.create(HoodieSchemaType.LONG); | ||
|
|
||
| List<HoodieSchemaField> fields = Arrays.asList( | ||
| HoodieSchemaField.of("precombine", HoodieSchema.createUnion(longSchema, nullSchema), null, 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the createUnion we can use the HoodieSchema.createNullable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swapped them out, will search for other references that i modified too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to call this out: Avro has a strict validation rule, where the default value for a union field must match the type of the FIRST element in the union type array.
- ["null", "int"] with default null (Valid)
- ["null", "int"] with default 0 (Invalid)
- ["int", "null"] with default 0 (Valid)
- ["int", "null"] with default null (Invalid)
If we use HoodieSchema.createNullable for int fields with default 0, the following error will be thrown for the affected tests:
Caused by: org.apache.avro.AvroTypeException: Invalid default for field age: 0 not a ["null","int"]
at org.apache.avro.Schema.validateDefault(Schema.java:1635)
at org.apache.avro.Schema.access$500(Schema.java:94)
at org.apache.avro.Schema$Field.<init>(Schema.java:561)
at org.apache.avro.Schema$Field.<init>(Schema.java:607)
at org.apache.hudi.avro.HoodieAvroUtils.createNewSchemaField(HoodieAvroUtils.java:381)
at org.apache.hudi.common.schema.HoodieSchemaField.of(HoodieSchemaField.java:113)
at org.apache.hudi.common.schema.HoodieSchemaField.of(HoodieSchemaField.java:92)
at org.apache.hudi.functional.TestBufferedRecordMerger.getSchema1(TestBufferedRecordMerger.java:815)
hudi-hadoop-common/src/test/java/org/apache/hudi/common/model/TestHoodieAvroIndexedRecord.java
Outdated
Show resolved
Hide resolved
...rce/hudi-flink/src/test/java/org/apache/hudi/table/format/TestFlinkRowDataReaderContext.java
Outdated
Show resolved
Hide resolved
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestUpdateProcessor.java
Outdated
Show resolved
Hide resolved
| // Schema can be null in test scenarios where schemas are not registered in the RecordContext (e.g. in tests) | ||
| if (schema != null) { | ||
| record = recordContext.seal(recordContext.toBinaryRow(schema, record)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want to seal in these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in the original code. AM just adding an additional check to ensure that schema is not null.
org.apache.hudi.common.engine.RecordContext#getSchemaFromBufferRecord calls
org.apache.hudi.common.engine.RecordContext#decodeAvroSchema, which is a nullable method.
hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieReaderContext.java
Outdated
Show resolved
Hide resolved
...di-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala
Outdated
Show resolved
Hide resolved
...di-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala
Outdated
Show resolved
Hide resolved
fb1da7d to
881d8e1
Compare
Nope, not combined with another branch. It's large because of how many other classes are using them. It cascades very quickly in a way where: Let's change the variable to a The changes mainly affect the I am trying not to touch the Most of the changes are in the |
881d8e1 to
e964274
Compare
CI report:
Bot commands@hudi-bot supports the following commands:
|
Describe the issue this Pull Request addresses
This Pull Request addresses the ongoing effort to refactor the Hudi codebase to use a unified
HoodieSchematype system for schema handling, migrating away from direct usage ofAvro's Schemain client-side code where possible.This specifically completes Phase 5: Java Client Core Migration as described in #14270.
NOTE: All changes made here avoids
FileGroup*classes.Summary and Changelog
This PR migrates key Hudi Java client components to use
HoodieSchemafor internal schema handling, enhancing maintainability without altering the Avro on-disk format.DeleteContext,RecordContextandHoodieReaderContextand their engine-specific implementations are updated to useHoodieSchema(instead of Avro Schema) for internal operations likegetValue,toBinaryRow, andgetOrderingValue.HoodieSchemaUtilsgains new client-side utilities, includinggetFieldPosition,projectSchema, andisReadCompatible.HoodieSchemachanges from the previous Column Statistics Migration phase into core utility classes (FileFormatUtils,HoodieTableMetadataUtil, etc.).This ensures records and metadata are consistently processed using the
HoodieSchemaabstraction in memory.Impact
Low
Risk Level
Low, risk is low due to refactoring maintaining Avro serialization compatibility via
HoodieSchema.toAvroSchema().Documentation Update
none
Contributor's checklist