Skip to content

[GLUTEN-10215][VL] Delta write: Preserve partition columns when required#12069

Draft
malinjawi wants to merge 2 commits into
apache:mainfrom
malinjawi:feature/delta-native-write-partition-column-output
Draft

[GLUTEN-10215][VL] Delta write: Preserve partition columns when required#12069
malinjawi wants to merge 2 commits into
apache:mainfrom
malinjawi:feature/delta-native-write-partition-column-output

Conversation

@malinjawi
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This patch preserves partition columns in native partition split output only when Delta's write contract includes those partition columns in dataColumns.

The change:

  • Detects whether Delta expects partition columns in the writer data columns.
  • Passes that contract to the Velox partition splitter.
  • Keeps native stats aggregation scoped to Delta data columns when the written batch includes extra partition columns.
  • Adds Delta 4.0 coverage for Iceberg-compatible partitioned native writes with stats enabled.

This is the second split from #12016. It is stacked on #12016 and should be reviewed after that PR merges.

Why are the changes needed?

Some Delta write modes keep partition columns in the writer batch. The native splitter should preserve those columns only for those modes, while the Delta stats tracker must still compute AddFile stats over Delta data columns only.

Does this PR introduce any user-facing change?

No public API change. This improves correctness for native Delta partitioned writes.

How was this patch tested?

Built locally and ran:

JAVA_HOME=/Library/Java/JavaVirtualMachines/zulu-17.jdk/Contents/Home \
./dev/run-scala-test.sh --force \
  -Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta \
  -pl backends-velox \
  -s org.apache.spark.sql.delta.DeltaNativeWriteSuite \
  -t "native delta Iceberg-compatible partitioned write should collect stats"

Result: 1 test passed, 0 failures.

Also ran the partitioned optimized write layout regression test on the stacked branch, Spark 4.0 backends-velox test compilation, C++ gluten/velox native build, and git diff --check.

Related issue: #10215

@malinjawi malinjawi force-pushed the feature/delta-native-write-partition-column-output branch from bf0a64a to d185350 Compare May 11, 2026 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant