Skip to content

[VL] Add Spark 4.1 TimeType support#12065

Open
malinjawi wants to merge 1 commit into
apache:mainfrom
malinjawi:fix/velox-time-type-support
Open

[VL] Add Spark 4.1 TimeType support#12065
malinjawi wants to merge 1 commit into
apache:mainfrom
malinjawi:fix/velox-time-type-support

Conversation

@malinjawi
Copy link
Copy Markdown
Contributor

@malinjawi malinjawi commented May 10, 2026

What changes are proposed in this pull request?
This PR adds native Spark 4.1 TIME data type support in the Velox backend by mapping Substrait TIME to Velox TIME_MICRO_UTC and handling TIME literals end to end.

The change:

  • adds Gluten Substrait TimeType and TimeLiteral nodes
  • converts Spark 4.1 TimeType literals from Spark nanos to Substrait/Velox micros
  • parses Substrait Type.Time as Velox TIME_MICRO_UTC in the C++ Substrait parser
  • maps TIME_MICRO_UTC through Velox/Substrait signatures and Velox-to-Substrait type conversion
  • converts Velox TIME_MICRO_UTC values back to Spark UnsafeRow nanos in columnar-to-row, including nested arrays, maps, and rows
  • allows supported TIME precision through Velox validation and planning paths
  • adds Java, Scala, and C++ coverage for time literals, type conversion, signatures, and columnar-to-row serialization

Why are the changes needed?
#11919 tracks Spark 4.1 TimeType support for the Velox backend. Spark 4.1 adds TimeType to DataTypeTestUtils.atomicTypes, so tests and planning paths that enumerate atomic types can hit TIME even when Gluten/Velox cannot yet parse or execute it. Mapping Substrait TIME to Velox TIME_MICRO_UTC gives Gluten a native representation for supported Spark TIME precision and unblocks literal/planning paths.

Does this PR introduce any user-facing change?
No public API change. It extends native Spark 4.1 TIME support in the Velox backend.

How was this patch tested?
Built and ran locally:

  • Spark 4.1 Velox compile with Scala 2.13 and Java 21
  • focused Spark 4.1 VeloxLiteralSuite Time Literal test
  • C++ Velox backend build
  • generic_benchmark smoke run

Commands:

  • git diff --cached --check
  • mvn -Pbackends-velox,spark-4.1,scala-2.13,java-21 -pl backends-velox -am -DskipTests -Dcheckstyle.skip -Dspotless.check.skip compile
  • ./dev/run-scala-test.sh --force -Pjava-21,spark-4.1,scala-2.13,backends-velox,hadoop-3.4,spark-ut,delta -pl backends-velox -s org.apache.gluten.execution.VeloxLiteralSuite -t "Time Literal" -Dscalastyle.skip=true -Dcheckstyle.skip -Dspotless.check.skip
  • cmake --build cpp/build --target velox -j 4

Native C++ unit-test targets were not run locally because this checkout does not have the Velox vector test utility archive built (libvelox_vector_test_lib.a).

Performance
I ran targeted local benchmarks comparing clean upstream main against this patch.

Focused columnar-to-row benchmark:

type before median us after median us delta
BIGINT 1325.407 1329.853 +0.335%
TIME_MICRO_UTC 1331.705 1643.357 +23.403%

The TIME_MICRO_UTC delta is expected because this patch adds the required micros-to-nanos normalization for Spark UnsafeRow compatibility; the baseline treated TIME as a raw BIGINT-like value.

Broad generic_q5 smoke benchmark:

metric before after delta
real median ms 29.076 29.159 +0.285%
CPU median ms 28.993 28.991 -0.006%

Related issue: #11919
Tracked by #11910

Was this patch authored or co-authored using generative AI tooling?
Generated-by: IBM BOB

@github-actions github-actions Bot added CORE works for Gluten Core VELOX labels May 10, 2026
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Map Substrait TIME to Velox TIME_MICRO_UTC and add time literal support.

Convert Spark TimeType nanos to Substrait/Velox micros for literals and convert Velox micros back to Spark UnsafeRow nanos in columnar-to-row.
@malinjawi malinjawi force-pushed the fix/velox-time-type-support branch from 675f551 to 7a061ac Compare May 10, 2026 13:19
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant