Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1205,6 +1205,11 @@ class VeloxSparkPlanExecApi extends SparkPlanExecApi with Logging {
override def genColumnarRangeExec(rangeExec: RangeExec): ColumnarRangeBaseExec =
ColumnarRangeExec(rangeExec.range)

override def isSupportRDDScanExec(plan: RDDScanExec): Boolean = true

override def getRDDScanTransform(plan: RDDScanExec): RDDScanTransformer =
VeloxRDDScanTransformer.replace(plan)

override def genColumnarTailExec(limit: Int, child: SparkPlan): ColumnarCollectTailBaseExec =
ColumnarCollectTailExec(limit, child)

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.gluten.execution

import org.apache.gluten.backendsapi.velox.VeloxValidatorApi
import org.apache.gluten.config.{GlutenConfig, VeloxConfig}

import org.apache.spark.rdd.RDD
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder}
import org.apache.spark.sql.catalyst.plans.physical.Partitioning
import org.apache.spark.sql.execution.{RDDScanTransformer, SparkPlan}
import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
import org.apache.spark.sql.vectorized.ColumnarBatch

/**
* Velox-backend implementation of RDDScanTransformer.
*
* Converts an RDD[InternalRow] into columnar batches using Velox's native row-to-columnar
* conversion (same JNI path as RowToVeloxColumnarExec).
*/
case class VeloxRDDScanTransformer(
outputAttributes: Seq[Attribute],
rdd: RDD[InternalRow],
name: String,
// Row-to-columnar conversion preserves data distribution, so we carry through
// the original partitioning. This differs from CH which uses UnknownPartitioning(0)
// but is consistent with RowToVeloxColumnarExec's behavior.
override val outputPartitioning: Partitioning,
override val outputOrdering: Seq[SortOrder]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation does not recurse into complex type element types

Problem: The type allowlist checks top-level types only. An ArrayType(UnsupportedType) or MapType(StringType, UnsupportedType) would pass validation but could fail at native execution time. The CH backend avoids this by delegating to ConverterUtils.getTypeNode() which recursively validates.

Evidence:

case _: org.apache.spark.sql.types.ArrayType =>   // passes any ArrayType, no element check
case _: org.apache.spark.sql.types.MapType =>      // passes any MapType, no key/value check
case _: org.apache.spark.sql.types.StructType =>   // passes any StructType, no field check

Suggested Fix:

case a: org.apache.spark.sql.types.ArrayType =>
  validateType(a.elementType)
case m: org.apache.spark.sql.types.MapType =>
  validateType(m.keyType)
  validateType(m.valueType)
case s: org.apache.spark.sql.types.StructType =>
  s.fields.foreach(f => validateType(f.dataType))

Alternatively, delegate to VeloxValidatorApi for centralized type validation.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a great point. Replaced the manual allowlist with VeloxValidatorApi.validateSchema which handles recursive validation for complex type elements and also catches variant shredded structs. This keeps validation logic centralized

) extends RDDScanTransformer(outputAttributes, outputPartitioning, outputOrdering) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description contradicts validation logic for complex types

Problem: The PR description states "rejects complex types (ARRAY, MAP, STRUCT)" but doValidateInternal() explicitly accepts these types. The code is correct — Velox does support complex types via UnsafeRowFast::deserialize. The PR description should be updated to avoid misleading reviewers.

Evidence:

case _: org.apache.spark.sql.types.ArrayType =>
case _: org.apache.spark.sql.types.MapType =>
case _: org.apache.spark.sql.types.StructType =>

These cases fall through to ValidationResult.succeeded, meaning complex types are accepted.

Suggested Fix: Update the PR description to remove the claim that complex types are rejected, e.g.:

Supports all Velox-compatible types including complex types (Array, Map, Struct). Rejects only truly unsupported types (e.g., CalendarIntervalType) with clean fallback to vanilla Spark.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — updated the PR description. It now correctly states that complex types (Array, Map, Struct) are supported via the UnsafeRowFast::deserialize path, and only truly unsupported types trigger fallback

@transient override lazy val metrics: Map[String, SQLMetric] = Map(
"numInputRows" -> SQLMetrics.createMetric(sparkContext, "number of input rows"),
"numOutputBatches" -> SQLMetrics.createMetric(sparkContext, "number of output batches"),
"convertTime" -> SQLMetrics.createTimingMetric(sparkContext, "time to convert")
)

override protected def doValidateInternal(): ValidationResult = {
for (field <- schema.fields) {
val reason = VeloxValidatorApi.validateSchema(field.dataType)
if (reason.isDefined) {
return ValidationResult.failed(reason.get)
}
}
ValidationResult.succeeded
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics gap in BatchCarrierRow unwrap path

Problem: When the RDD contains BatchCarrierRow instances (e.g., from df.checkpoint() on a Gluten plan), the code unwraps columnar batches directly without updating numInputRows, numOutputBatches, or convertTime. Spark UI will show zeros for this operator when processing checkpointed data, making performance debugging difficult.

Evidence:

case _: BatchCarrierRow =>
  // No metrics updated here
  (Iterator.single(first) ++ iter).flatMap(row => BatchCarrierRow.unwrap(row))

Suggested Fix:

case _: BatchCarrierRow =>
  (Iterator.single(first) ++ iter).flatMap { row =>
    BatchCarrierRow.unwrap(row).map { batch =>
      numOutputBatches += 1
      numInputRows += batch.numRows()
      batch
    }
  }

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the BatchCarrierRow unwrap path to increment numOutputBatches and numInputRows per batch, so Spark UI now shows correct metrics for checkpointed data. convertTime is intentionally omitted since no row-to-columnar conversion happens in this path.

}

override def doExecuteColumnar(): RDD[ColumnarBatch] = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RowToVeloxColumnarExec.toColumnarBatchIterator does UnsafeProjection.apply(row), which throws on a BatchCarrierRow since PlaceholderRow's getters all throw UnsupportedOperationException. This can show up via df.checkpoint() or user code that does df.queryExecution.toRdd and re-wraps with LogicalRDD.fromDataset, when the upstream Gluten plan ends in VeloxColumnarToCarrierRowExec. CHRDDScanTransformer.scala L101-104 detects this and unwraps via findNextTerminalRow.batch(). Either mirror that, or fail fast with a clear error for carrier rows and add a checkpoint round-trip test to document the current behavior.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch — this is a real bug. If the upstream RDD was produced by a Gluten plan ending in VeloxColumnarToCarrierRowExec (e.g., via df.checkpoint()), the rows would be BatchCarrierRow instances and UnsafeProjection.apply() would throw. Fixed by peeking at the first row and branching: carrier rows are unwrapped directly via BatchCarrierRow.unwrap(), skipping row-to-columnar conversion entirely. This mirrors the CH pattern.

val numInputRows = longMetric("numInputRows")
val numOutputBatches = longMetric("numOutputBatches")
val convertTime = longMetric("convertTime")
val localSchema = this.schema
val batchSize = GlutenConfig.get.maxBatchSize
val batchBytes = VeloxConfig.get.veloxPreferredBatchBytes
rdd.mapPartitions {
iter =>
if (iter.hasNext) {
val first = iter.next()
first match {
case _: BatchCarrierRow =>
// RDD already contains columnar batches wrapped as carrier rows
// (e.g., from df.checkpoint() on a Gluten plan). Unwrap directly.
(Iterator.single(first) ++ iter).flatMap {
row =>
BatchCarrierRow.unwrap(row).map {
batch =>
numOutputBatches += 1
numInputRows += batch.numRows()
batch
}
}
case _ =>
// Standard InternalRow path - convert via native row-to-columnar.
RowToVeloxColumnarExec.toColumnarBatchIterator(
Iterator.single(first) ++ iter,
localSchema,
numInputRows,
numOutputBatches,
convertTime,
batchSize,
batchBytes)
}
} else {
Iterator.empty
}
}
}

override protected def withNewChildrenInternal(
newChildren: IndexedSeq[SparkPlan]): SparkPlan = {
assert(newChildren.isEmpty, "VeloxRDDScanTransformer is a leaf node")
copy(outputAttributes, rdd, name, outputPartitioning, outputOrdering)
}
}

object VeloxRDDScanTransformer {
def replace(plan: org.apache.spark.sql.execution.RDDScanExec): RDDScanTransformer =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CH uses UnknownPartitioning(0); we pass plan.outputPartitioning through. If the original RDDScanExec declares e.g. HashPartitioning, downstream Velox ops might skip a shuffle based on a hint we never verified survives the row→columnar conversion. Worth either justifying with a comment or aligning with CH.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern. Row-to-columnar conversion doesn't change data distribution — it converts row format within each partition, preserving the partition layout. This is consistent with RowToVeloxColumnarExec which also carries through the child's outputPartitioning. Added an inline comment explaining the rationale and the difference from CH's approach

VeloxRDDScanTransformer(
plan.output,
plan.inputRDD,
plan.nodeName,
plan.outputPartitioning,
plan.outputOrdering)
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.spark.sql.execution

import org.apache.gluten.execution._

import org.apache.spark.SparkConf
import org.apache.spark.sql.{DataFrame, Row}
import org.apache.spark.sql.classic.ClassicDataset
import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanHelper
import org.apache.spark.sql.types._
import org.apache.spark.util.Utils

class VeloxRDDScanSuite extends VeloxWholeStageTransformerSuite with AdaptiveSparkPlanHelper {

override protected val resourcePath: String = "/tpch-data-parquet"
override protected val fileFormat: String = "parquet"

override protected def sparkConf: SparkConf = {
super.sparkConf
.set("spark.sql.ansi.enabled", "false")
}

override def beforeAll(): Unit = {
super.beforeAll()
createTPCHNotNullTables()
}

/** Creates a DataFrame backed by LogicalRDD/RDDScanExec from an existing DataFrame. */
private def asRDDScanDF(data: DataFrame): DataFrame = {
val node = LogicalRDD(
data.queryExecution.logical.output,
data.queryExecution.toRdd)(data.sparkSession)
ClassicDataset.ofRows(spark, node).toDF()
}

test("basic RDDScanExec is replaced by VeloxRDDScanTransformer") {
val data = spark.sql("SELECT l_orderkey, l_partkey FROM lineitem LIMIT 10")
val expectedAnswer = data.collect()
val df = asRDDScanDF(data)

checkAnswer(df, expectedAnswer)
val cnt = collect(df.queryExecution.executedPlan) { case _: VeloxRDDScanTransformer => true }
assert(cnt.nonEmpty, "Expected VeloxRDDScanTransformer in plan")
}

test("RDDScan with string and numeric types") {
val data = spark.sql("""SELECT l_returnflag, l_linestatus, l_quantity, l_extendedprice
|FROM lineitem LIMIT 20""".stripMargin)
val expectedAnswer = data.collect()
val df = asRDDScanDF(data)

checkAnswer(df, expectedAnswer)
val cnt = collect(df.queryExecution.executedPlan) { case _: VeloxRDDScanTransformer => true }
assert(cnt.nonEmpty, "Expected VeloxRDDScanTransformer in plan")
}

test("RDDScan with aggregation downstream") {
val query =
"""SELECT l_returnflag, sum(l_quantity) AS sum_qty
|FROM lineitem
|WHERE l_shipdate <= date'1998-09-02'
|GROUP BY l_returnflag""".stripMargin
val data = spark.sql(query)
val expectedAnswer = data.collect()
val df = asRDDScanDF(data)

checkAnswer(df, expectedAnswer)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test — and the following empty RDD / multiple re-reads / null values / array / map / struct ones — only does checkAnswer. Without a collectFirst { case _: VeloxRDDScanTransformer => true } assertion they'd silently pass even if the rewriter stopped offloading (vanilla Spark also gets the right answer). Tests 1 and 2 already assert plan shape; please add the same here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — tests 3–11 would silently pass even if offloading stopped working. Added collect { case _: VeloxRDDScanTransformer => true } assertions to all 8 tests that were missing them. The unsupported-type fallback test already asserts the absence of the transformer, so that one was fine as-is.

val cnt = collect(df.queryExecution.executedPlan) { case _: VeloxRDDScanTransformer => true }
assert(cnt.nonEmpty, "Expected VeloxRDDScanTransformer in plan")
}

test("RDDScan with empty RDD") {
val data = spark.sql("SELECT l_orderkey FROM lineitem WHERE 1 = 0")
val expectedAnswer = data.collect()
val df = asRDDScanDF(data)

checkAnswer(df, expectedAnswer)
assert(df.count() == 0)
val cnt = collect(df.queryExecution.executedPlan) { case _: VeloxRDDScanTransformer => true }
assert(cnt.nonEmpty, "Expected VeloxRDDScanTransformer in plan")
}

test("RDDScan preserves data correctness with multiple re-reads") {
val data = spark.sql("SELECT l_orderkey, l_partkey FROM lineitem LIMIT 50")
val expectedAnswer = data.collect()
val df = asRDDScanDF(data)

// Read twice to verify idempotency
checkAnswer(df, expectedAnswer)
checkAnswer(df, expectedAnswer)
val cnt = collect(df.queryExecution.executedPlan) { case _: VeloxRDDScanTransformer => true }
assert(cnt.nonEmpty, "Expected VeloxRDDScanTransformer in plan")
}

test("RDDScan with null values") {
val rdd = spark.sparkContext.parallelize(
Seq(
Row(1, "a", null),
Row(null, "b", 2.0),
Row(3, null, 3.0)
))
val schema = StructType(
Seq(
StructField("id", IntegerType, nullable = true),
StructField("name", StringType, nullable = true),
StructField("value", DoubleType, nullable = true)
))
val data = spark.createDataFrame(rdd, schema)
val expectedAnswer = data.collect()
val df = asRDDScanDF(data)

checkAnswer(df, expectedAnswer)
val cnt = collect(df.queryExecution.executedPlan) { case _: VeloxRDDScanTransformer => true }
assert(cnt.nonEmpty, "Expected VeloxRDDScanTransformer in plan")
}

test("RDDScan with all supported primitive types") {
val rdd = spark.sparkContext.parallelize(
Seq(
Row(
true,
1.toByte,
2.toShort,
3,
4L,
5.0f,
6.0,
"hello",
java.sql.Date.valueOf("2024-01-01"),
java.sql.Timestamp.valueOf("2024-01-01 12:00:00"),
Array[Byte](1, 2, 3),
BigDecimal("123.45").underlying()
)
))
val schema = StructType(
Seq(
StructField("bool", BooleanType),
StructField("byte", ByteType),
StructField("short", ShortType),
StructField("int", IntegerType),
StructField("long", LongType),
StructField("float", FloatType),
StructField("double", DoubleType),
StructField("string", StringType),
StructField("date", DateType),
StructField("timestamp", TimestampType),
StructField("binary", BinaryType),
StructField("decimal", DecimalType(10, 2))
))
val data = spark.createDataFrame(rdd, schema)
val expectedAnswer = data.collect()
val df = asRDDScanDF(data)

checkAnswer(df, expectedAnswer)
val cnt = collect(df.queryExecution.executedPlan) { case _: VeloxRDDScanTransformer => true }
assert(cnt.nonEmpty, "Expected VeloxRDDScanTransformer in plan")
}

test("RDDScan with array type") {
val rdd = spark.sparkContext.parallelize(
Seq(
Row(Seq(1, 2, 3)),
Row(Seq(4, 5))
))
val schema = StructType(Seq(StructField("arr", ArrayType(IntegerType))))
val data = spark.createDataFrame(rdd, schema)
val expectedAnswer = data.collect()
val df = asRDDScanDF(data)

checkAnswer(df, expectedAnswer)
val cnt = collect(df.queryExecution.executedPlan) { case _: VeloxRDDScanTransformer => true }
assert(cnt.nonEmpty, "Expected VeloxRDDScanTransformer in plan")
}

test("RDDScan with map type") {
val rdd = spark.sparkContext.parallelize(
Seq(
Row(Map("a" -> 1, "b" -> 2)),
Row(Map("c" -> 3))
))
val schema = StructType(Seq(StructField("m", MapType(StringType, IntegerType))))
val data = spark.createDataFrame(rdd, schema)
val expectedAnswer = data.collect()
val df = asRDDScanDF(data)

checkAnswer(df, expectedAnswer)
val cnt = collect(df.queryExecution.executedPlan) { case _: VeloxRDDScanTransformer => true }
assert(cnt.nonEmpty, "Expected VeloxRDDScanTransformer in plan")
}

test("RDDScan with struct type") {
val rdd = spark.sparkContext.parallelize(
Seq(
Row(Row("hello", 1)),
Row(Row("world", 2))
))
val innerSchema = StructType(
Seq(StructField("name", StringType), StructField("value", IntegerType)))
val schema = StructType(Seq(StructField("s", innerSchema)))
val data = spark.createDataFrame(rdd, schema)
val expectedAnswer = data.collect()
val df = asRDDScanDF(data)

checkAnswer(df, expectedAnswer)
val cnt = collect(df.queryExecution.executedPlan) { case _: VeloxRDDScanTransformer => true }
assert(cnt.nonEmpty, "Expected VeloxRDDScanTransformer in plan")
}

test("RDDScan falls back for unsupported types") {
val data = spark.sql("SELECT INTERVAL '1' DAY AS di")
val expectedAnswer = data.collect()
val result = asRDDScanDF(data)

// Should still produce correct results via fallback to vanilla Spark
checkAnswer(result, expectedAnswer)
val cnt = collect(result.queryExecution.executedPlan) {
case _: VeloxRDDScanTransformer => true
}
assert(cnt.isEmpty, "Expected fallback - VeloxRDDScanTransformer should NOT be in plan")
}

test("RDDScan handles BatchCarrierRow from checkpoint") {
val tempDir = Utils.createTempDir()
try {
spark.sparkContext.setCheckpointDir(tempDir.getAbsolutePath)
val df = spark.range(100).selectExpr("id", "id * 2 as value")
val checkpointed = df.localCheckpoint()
val result = asRDDScanDF(checkpointed)

checkAnswer(result, df.collect())
val cnt = collect(result.queryExecution.executedPlan) {
case _: VeloxRDDScanTransformer => true
}
assert(cnt.nonEmpty, "Expected VeloxRDDScanTransformer in plan")
} finally {
Utils.deleteRecursively(tempDir)
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for complex types and unsupported-type fallback

Problem: The 7 tests cover primitives, nulls, empty RDD, and aggregation — but two important scenarios are untested:

  1. Complex types (ArrayType, MapType, StructType) — validation explicitly accepts them, but no test exercises the full row-to-columnar JNI path with nested data.
  2. Unsupported type fallback — no test verifies that a truly unsupported type (e.g., CalendarIntervalType) triggers graceful fallback to vanilla Spark instead of a runtime crash.

Suggested Fix: Add at least these two tests:

test("RDDScan with array type") {
  val rdd = spark.sparkContext.parallelize(Seq(Row(Seq(1, 2, 3)), Row(Seq(4, 5))))
  val schema = StructType(Seq(StructField("arr", ArrayType(IntegerType))))
  val data = spark.createDataFrame(rdd, schema)
  val expectedAnswer = data.collect()
  val node = LogicalRDD.fromDataset(
    rdd = data.queryExecution.toRdd, originDataset = data, isStreaming = false)
  val df = ClassicDataset.ofRows(spark, node).toDF()
  checkAnswer(df, expectedAnswer)
}

test("RDDScan falls back for unsupported types") {
  // Create RDD with CalendarIntervalType or another unsupported type
  // Verify plan does NOT contain VeloxRDDScanTransformer (i.e., fallback occurred)
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 4 new tests: array type, map type, struct type, and unsupported-type fallback (DayTimeIntervalType → verifies VeloxRDDScanTransformer is absent from plan). Total coverage is now 11 tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test for BatchCarrierRow unwrap path

Problem: The new BatchCarrierRow detection logic in doExecuteColumnar is production code added to handle checkpointed Gluten DataFrames, but no test exercises this specific branch. If the unwrap logic regresses, the existing tests won't catch it since they all go through the standard InternalRow conversion path.

Suggested Fix: Add a test that forces the BatchCarrierRow path:

test("RDDScan handles BatchCarrierRow from checkpoint") {
  spark.sparkContext.setCheckpointDir(tempPath)
  val df = spark.range(100).selectExpr("id", "id * 2 as value")
  val checkpointed = df.localCheckpoint()
  val result = asRDDScanDF(checkpointed)
  checkAnswer(result, df.collect())
  val cnt = collect(result.queryExecution.executedPlan) {
    case _: VeloxRDDScanTransformer => true
  }
  assert(cnt.nonEmpty, "Expected VeloxRDDScanTransformer in plan")
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a localCheckpoint() round-trip test that exercises the BatchCarrierRow detection and unwrap logic. It verifies both result correctness and that VeloxRDDScanTransformer is present in the plan.