Skip to content

Conversation

@the-other-tim-brown
Copy link
Contributor

@the-other-tim-brown the-other-tim-brown commented Dec 3, 2025

Describe the issue this Pull Request addresses

Currently there are two methods that developers must call to ensure that a StructType is properly converted to an Avro. This is error prone and can be simplified by unifying the code.

Summary and Changelog

  • Set doc field when first constructing the Avro from the StructType
  • Set default null value when field is nullable
  • Set doc string in StructType for round trip conversion

Impact

Simplifies the code and eliminates risk of developers forgetting to call method to add doc strings or null values to the schema

Risk Level

None

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Dec 3, 2025
Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yihua
Copy link
Contributor

yihua commented Dec 4, 2025

In Spark 3.5 + Scala 2.13, the test failed because of mismatched ArrayBuffer and Array, likely due to difference in array type between Scala 2.12 and 2.13.

- Test Create External Hoodie Table *** FAILED ***
  Expected ArrayBuffer(StructField(_hoodie_commit_time,StringType,true), StructField(_hoodie_commit_seqno,StringType,true), StructField(_hoodie_record_key,StringType,true), StructField(_hoodie_partition_path,StringType,true), StructField(_hoodie_file_name,StringType,true), StructField(id,IntegerType,true), StructField(name,StringType,true), StructField(price,DoubleType,true), StructField(ts,LongType,true), StructField(dt,StringType,true)), but got Array(StructField(_hoodie_commit_time,StringType,true), StructField(_hoodie_commit_seqno,StringType,true), StructField(_hoodie_record_key,StringType,true), StructField(_hoodie_partition_path,StringType,true), StructField(_hoodie_file_name,StringType,true), StructField(id,IntegerType,true), StructField(name,StringType,true), StructField(price,DoubleType,true), StructField(ts,LongType,true), StructField(dt,StringType,true)) (TestCreateTable.scala:232)

@hudi-bot
Copy link
Collaborator

hudi-bot commented Dec 4, 2025

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@the-other-tim-brown the-other-tim-brown merged commit d5608c3 into apache:master Dec 4, 2025
136 of 137 checks passed
@the-other-tim-brown the-other-tim-brown deleted the schema-doc-simplification branch December 4, 2025 20:12
the-other-tim-brown added a commit to the-other-tim-brown/hudi that referenced this pull request Dec 5, 2025
… docs and default values (apache#17473)

* move docs and default values to schema construction

* update test, make variables more readable, fix null ordering

* make doc check for null or empty

* try printing ddl

* try logging ddl

* fix logging

* ensure table names are unique across tests to generate unique schemas

* preserve h prefix for ordering
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants