Skip to content

[SparkUT] SPARK-33134: return partial results only for root JSON objects failed in JsonFunctionsSuite #14088

@GaryShen2008

Description

@GaryShen2008

Describe the bug
The GPU output of from_json is different from CPU's output.

Steps/Code to reproduce bug

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

val st = new StructType().add("c1", LongType).add("c2", ArrayType(new StructType().add("c3", LongType).add("c4", StringType)))

val df2 = Seq("""{"data": {"c2": [19], "c1": 123456}}""").toDF("c0")

df2.select(from_json($"c0", new StructType().add("data", st))).show

spark.conf.set("spark.rapids.sql.enabled", "false")

df2.select(from_json($"c0", new StructType().add("data", st))).show

GPU:

+----------------+
|   from_json(c0)|
+----------------+
|{{123456, null}}|
+----------------+

CPU:

+-------------+
|from_json(c0)|
+-------------+
|       {null}|
+-------------+

Spark-shell command:

spark-shell   --master local[2]   --conf spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation,org.apache.spark.sql.catalyst.optimizer.ConstantFolding   --conf spark.rapids.sql.enabled=true   --conf spark.plugins=com.nvidia.spark.SQLPlugin   --conf spark.sql.queryExecutionListeners=org.apache.spark.sql.rapids.ExecutionPlanCaptureCallback   --conf spark.rapids.sql.explain=ALL   --conf spark.rapids.sql.test.isFoldableNonLitAllowed=true   --conf spark.rapids.sql.csv.read.decimal.enabled=true   --conf spark.rapids.sql.format.avro.enabled=true   --conf spark.rapids.sql.format.avro.read.enabled=true   --conf spark.rapids.sql.format.hive.text.write.enabled=true   --conf spark.rapids.sql.format.json.enabled=true   --conf spark.rapids.sql.format.json.read.enabled=true   --conf spark.rapids.sql.incompatibleDateFormats.enabled=true   --conf spark.rapids.sql.python.gpu.enabled=true   --conf spark.rapids.sql.rowBasedUDF.enabled=true   --conf spark.rapids.sql.window.collectList.enabled=true   --conf spark.rapids.sql.window.collectSet.enabled=true   --conf spark.rapids.sql.window.range.byte.enabled=true   --conf spark.rapids.sql.window.range.short.enabled=true   --conf spark.rapids.sql.expression.Ascii=true   --conf spark.rapids.sql.expression.Conv=true   --conf spark.rapids.sql.expression.GetJsonObject=true   --conf spark.rapids.sql.expression.JsonToStructs=true   --conf spark.rapids.sql.expression.StructsToJson=true   --conf spark.rapids.sql.exec.CollectLimitExec=true   --conf spark.rapids.sql.exec.FlatMapCoGroupsInPandasExec=true   --conf spark.rapids.sql.exec.WindowInPandasExec=true   --conf spark.rapids.sql.hasExtendedYearValues=false   --conf spark.unsafe.exceptionOnMemoryLeak=true   --conf spark.sql.session.timeZone=UTC

Expected behavior
GPU should output as same as CPU does.

Environment details (please complete the following information)

  • Environment location: [Local mode]

Additional context
It's a further test case related to #10901 which has been fixed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ? - Needs TriageNeed team to review and classifybugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions