Skip to content

[VL][Delta] Offload Delta OPTIMIZE compaction command transactions#12024

Merged
zhztheplayer merged 2 commits into
apache:mainfrom
malinjawi:vl-delta-optimize-compaction-offload
May 8, 2026
Merged

[VL][Delta] Offload Delta OPTIMIZE compaction command transactions#12024
zhztheplayer merged 2 commits into
apache:mainfrom
malinjawi:vl-delta-optimize-compaction-offload

Conversation

@malinjawi
Copy link
Copy Markdown
Contributor

@malinjawi malinjawi commented May 3, 2026

What changes are proposed in this pull request?

This PR adds standalone Delta OPTIMIZE compaction command offload for the Velox backend.

It lets Delta OPTIMIZE bin-pack compaction transactions run through GlutenOptimisticTransaction when native Delta write is enabled, so the compaction read/write command path can use Gluten's native Delta transaction handling.

Main changes:

  • add GlutenDeltaRunnableCommand, a wrapper for non-leaf Delta RunnableCommand implementations
  • wrap Delta OptimizeTableCommand in the Delta command offload rule for compaction-only OPTIMIZE
  • support both path-based and table-name OPTIMIZE compaction command forms
  • support partition-predicate compaction with OPTIMIZE ... WHERE
  • preserve fallback behavior when native Delta write is disabled
  • keep existing DELETE, UPDATE, save, CTAS, and RTAS command offload behavior unchanged
  • keep non-compaction OPTIMIZE forms on the existing Spark path for now:
    • OPTIMIZE ZORDER BY
    • liquid-clustering / clustered-table OPTIMIZE
    • REORG
    • FULL OPTIMIZE
  • add focused Spark 3.5 and Spark 4.0 coverage for:
    • OPTIMIZE delta.\path``
    • OPTIMIZE table_name
    • OPTIMIZE ... WHERE partition-predicate compaction
    • OptimizeMetrics add/remove-file accounting
    • Delta history operation metadata
    • native-write-disabled fallback

This PR is intentionally compaction-only:

  • no native ZORDER expression support yet
  • no InterleaveBits, HilbertLongIndex, or RangePartitionId support yet
  • no OPTIMIZE ZORDER sampling/shuffle planning changes yet
  • no liquid-clustering OPTIMIZE offload yet
  • no Optimized Write or auto-compaction changes yet

Those belong in follow-up PRs under the Delta optimization tracker.

Issue: #12025.

How was this patch tested?

Added and expanded Delta native write coverage in:

  • backends-velox/src-delta33/test/scala/org/apache/spark/sql/delta/DeltaNativeWriteSuite.scala
  • backends-velox/src-delta40/test/scala/org/apache/spark/sql/delta/DeltaNativeWriteSuite.scala

Validation run locally:

  • Spark 3.5 / Scala 2.12 clean compile/test-compile
  • Spark 3.5 / Scala 2.12 DeltaNativeWriteSuite + ClusteredTableClusteringSuite: 8 tests passed
  • Spark 3.5 / Scala 2.13 clean compile/test-compile
  • Spark 3.5 / Scala 2.13 DeltaNativeWriteSuite + ClusteredTableClusteringSuite: 8 tests passed
  • Spark 4.0 / Scala 2.13 clean compile/test-compile
  • Spark 4.0 / Scala 2.13 DeltaNativeWriteSuite + ClusteredTableClusteringSuite: 16 tests passed
  • Spark 3.5 Spotless check
  • Spark 4.0 Spotless check

Was this patch authored or co-authored using generative AI tooling?

Generated-by: IBM BOB

@github-actions github-actions Bot added the VELOX label May 3, 2026
@malinjawi malinjawi force-pushed the vl-delta-optimize-compaction-offload branch 3 times, most recently from 8da75af to e2d0a90 Compare May 3, 2026 13:16
@github-actions github-actions Bot added the CORE works for Gluten Core label May 3, 2026
Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

val snapshot = getDeltaTable(optimize.child, "OPTIMIZE").update()
ClusteredTableUtils.isSupported(snapshot.protocol)
} catch {
case NonFatal(_) => true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which exception is thrown here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhztheplayer Good find!

Yea no exception needed here. I added this catch as a guard at first but after rechecking the path it is better to let getDeltaTable(...).update() fail normally if table resolution or snapshot loading fails.

Removed now.

Let me know if that sounds sound.

optimize.optimizeContext.reorg.isEmpty &&
!optimize.optimizeContext.isFull &&
!isClusteredOptimize(optimize)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comment to this method?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added a comment now to clarify that the offload check is intentionally limited to the currently supported OPTIMIZE shape: plain bin-packing compaction in this scope.

So OPTIMIZE variants with layout-specific semantics, such as ZORDER, REORG, OPTIMIZE FULL, and liquid clustering, continue to use Delta’s original command path until native support is added for those paths with later patches.

@malinjawi malinjawi force-pushed the vl-delta-optimize-compaction-offload branch from cd8f805 to b650168 Compare May 7, 2026 13:25
@malinjawi malinjawi force-pushed the vl-delta-optimize-compaction-offload branch from b650168 to 97e1690 Compare May 7, 2026 13:37
@malinjawi malinjawi requested a review from zhztheplayer May 7, 2026 15:49
@zhztheplayer zhztheplayer merged commit 4fa7bdc into apache:main May 8, 2026
58 checks passed
@felipepessoto
Copy link
Copy Markdown
Contributor

@malinjawi @zhztheplayer the docs https://github.com/apache/gluten/blob/main/docs/get-started/VeloxDelta.md says Liquid is supported, but PR description says it is not. Are the docs incorrect?

@malinjawi
Copy link
Copy Markdown
Contributor Author

malinjawi commented May 12, 2026

Thanks for catching this @felipepessoto.

Yes, the current main docs are too broad/misleading if read as native Velox support.

The current merged state is:

  • PR [VL][Delta] Offload Delta OPTIMIZE compaction command transactions #12024 adds native offload only for plain Delta OPTIMIZE bin-packing compaction.
  • Liquid/clustered-table OPTIMIZE is explicitly excluded in OffloadDeltaCommand.shouldOffloadOptimize through !isClusteredOptimize(optimize), and the code comment says liquid clustering continues through Delta original command path.
  • ClusteredTableClusteringSuite gives correctness coverage for clustered-table OPTIMIZE, but that is fallback behavior, not native liquid clustering offload.

So I would describe Liquid as: ordinary Delta scans/writes on those tables can still follow the normal Delta offload rules when the final plan validates, but the Liquid-specific clustering/OPTIMIZE operation itself falls back to Delta/Spark today. It should not be documented as a blanket Yes.

I updated the draft docs PR here to make that explicit: #12050. It now marks Liquid clustering as Fallback and separately marks plain OPTIMIZE compaction as ExperimentalOffload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants