[VL] Add LocalTableScanExec support to Velox backend#12080
Open
minni31 wants to merge 1 commit into
Open
Conversation
|
Run Gluten Clickhouse CI on x86 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CONTEXT
LocalTableScanExecis a Spark physical operator that materializes in-memory data (e.g., fromDataset.toDF(),spark.range(), or constant relations optimized by the catalyst planner). Currently, Gluten does not offload this operator, so the output rows stay in Spark's internal row format and require a separate row-to-columnar conversion step before downstream Velox operators can consume them.This is the companion PR to #12077 (RDDScanExec support) — both follow the same design pattern.
WHAT
Adds a
VeloxLocalTableScanTransformerthat interceptsLocalTableScanExecin the offload rules and performs row-to-columnar conversion using Velox's nativeRowToColumnarConverter. The implementation:LocalTableScanTransformerbase trait ingluten-substraitwith the backend-agnostic contract (output attributes, row data, schema validation).VeloxLocalTableScanTransformerthat delegates schema validation toVeloxValidatorApi.validateSchema— the same canonical validator used by all other Velox operators — ensuring recursive complex-type validation, TimestampNTZ handling, and variant struct detection are handled consistently.OffloadSingleNodeRulesand the backend factory method inVeloxSparkPlanExecApi.plan.getStream.isEmptyguard) since those follow a different execution path.SQLMetrics(numInputRows, numOutputBatches, convertTime) so conversion costs are visible in the Spark UI.toColumnarBatchIteratoroverload to pass plan-level metrics to the native converter.Tests
VeloxLocalTableScanSuite