Skip to content

Conversation

@szehon-ho
Copy link
Member

@szehon-ho szehon-ho commented Dec 6, 2025

What changes were proposed in this pull request?

The 'struct coercion' feature for MERGE INTO (allowing it to pass if assigning a struct with less fields into a struct with more fields) is turned off in a flag in #53229 due to some ambiguity in behavior, but was not removed because the community wanted to try it.

We want to still keep it under a flag, but we make a choice about which behavior to support when the flag is on. In particular, we want UPDATE SET * to explode to all nested struct fields, so that in this scenario, existing nested struct fields are preserved.

Why are the changes needed?

@aokolnychyi tested the feature and thinks that even if it is behind the experimental flag, we should take the stance for now that UPDATE SET * should explode to all nested fields vs top level columns.

The rationale being:

  • its always safer to not override user values with null
  • Spark in general tries to treat nested fields like columns
  • there's already a way for the user to override the whole struct (and nullify non-existing fields) by specifying the struct explicitly, ie UPDATE SET struct = source.struct

Does this PR introduce any user-facing change?

No, the whole feature is new and hidden behind an experimental flag.

How was this patch tested?

Existing tests (some output changes to not be null)

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Dec 6, 2025
@szehon-ho
Copy link
Member Author

szehon-ho commented Dec 6, 2025

Hi @dongjoon-hyun sorry for the back and forth here. As in the description, @aokolnychyi explained he preferred to make the behavior choose UPDATE SET * to refer to nested fields, due to the reasons above. The whole feature (MERGE INTO struct coercion) is still under an experimental flag and off by default, but we want to make this stance if the flag is on.

@szehon-ho
Copy link
Member Author

Btw, The code is not new code, its the same code in #53149 which was removed, it is just brought back.

@dongjoon-hyun
Copy link
Member

No problem, but I believe this is only applicable for master branch only, @szehon-ho .

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Dec 6, 2025

So, +1 for 4.2.0 for the proposal although I didn't take a look at the code yet.

@szehon-ho
Copy link
Member Author

szehon-ho commented Dec 6, 2025

Hi, @dongjoon-hyun , @aokolnychyi mentioned it would be good to get into 4.1, because we are still releasing the feature of 'struct coercion' , albeit with a flag. So he wanted to start it off with the better choice. Code-wise its the same as before the revert, although the whole thing has a flag. Seems from the comments of #53229, the community is interested in this feature. Ill ping him to comment as well

@dongjoon-hyun
Copy link
Member

Sorry but I still believe this fits for Apache Spark 4.2.0 (after checking the code again). This is only for Apache Spark 4.2.0, @szehon-ho . We are ramping down instead of ramping up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants