[VL][Delta] Offload Delta OPTIMIZE compaction command transactions#12024
Conversation
8da75af to
e2d0a90
Compare
| val snapshot = getDeltaTable(optimize.child, "OPTIMIZE").update() | ||
| ClusteredTableUtils.isSupported(snapshot.protocol) | ||
| } catch { | ||
| case NonFatal(_) => true |
There was a problem hiding this comment.
Which exception is thrown here?
There was a problem hiding this comment.
@zhztheplayer Good find!
Yea no exception needed here. I added this catch as a guard at first but after rechecking the path it is better to let getDeltaTable(...).update() fail normally if table resolution or snapshot loading fails.
Removed now.
Let me know if that sounds sound.
| optimize.optimizeContext.reorg.isEmpty && | ||
| !optimize.optimizeContext.isFull && | ||
| !isClusteredOptimize(optimize) | ||
| } |
There was a problem hiding this comment.
Can we add comment to this method?
There was a problem hiding this comment.
Yes, added a comment now to clarify that the offload check is intentionally limited to the currently supported OPTIMIZE shape: plain bin-packing compaction in this scope.
So OPTIMIZE variants with layout-specific semantics, such as ZORDER, REORG, OPTIMIZE FULL, and liquid clustering, continue to use Delta’s original command path until native support is added for those paths with later patches.
cd8f805 to
b650168
Compare
b650168 to
97e1690
Compare
|
@malinjawi @zhztheplayer the docs https://github.com/apache/gluten/blob/main/docs/get-started/VeloxDelta.md says Liquid is supported, but PR description says it is not. Are the docs incorrect? |
|
Thanks for catching this @felipepessoto. Yes, the current main docs are too broad/misleading if read as native Velox support. The current merged state is:
So I would describe Liquid as: ordinary Delta scans/writes on those tables can still follow the normal Delta offload rules when the final plan validates, but the Liquid-specific clustering/OPTIMIZE operation itself falls back to Delta/Spark today. It should not be documented as a blanket I updated the draft docs PR here to make that explicit: #12050. It now marks Liquid clustering as |
What changes are proposed in this pull request?
This PR adds standalone Delta OPTIMIZE compaction command offload for the Velox backend.
It lets Delta
OPTIMIZEbin-pack compaction transactions run throughGlutenOptimisticTransactionwhen native Delta write is enabled, so the compaction read/write command path can use Gluten's native Delta transaction handling.Main changes:
GlutenDeltaRunnableCommand, a wrapper for non-leaf DeltaRunnableCommandimplementationsOptimizeTableCommandin the Delta command offload rule for compaction-only OPTIMIZEOPTIMIZE ... WHEREOPTIMIZE ZORDER BYREORGFULLOPTIMIZEOPTIMIZE delta.\path``OPTIMIZE table_nameOPTIMIZE ... WHEREpartition-predicate compactionThis PR is intentionally compaction-only:
Those belong in follow-up PRs under the Delta optimization tracker.
Issue: #12025.
How was this patch tested?
Added and expanded Delta native write coverage in:
backends-velox/src-delta33/test/scala/org/apache/spark/sql/delta/DeltaNativeWriteSuite.scalabackends-velox/src-delta40/test/scala/org/apache/spark/sql/delta/DeltaNativeWriteSuite.scalaValidation run locally:
DeltaNativeWriteSuite+ClusteredTableClusteringSuite: 8 tests passedDeltaNativeWriteSuite+ClusteredTableClusteringSuite: 8 tests passedDeltaNativeWriteSuite+ClusteredTableClusteringSuite: 16 tests passedWas this patch authored or co-authored using generative AI tooling?
Generated-by: IBM BOB