Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GpuConv operator for the conv 10<->16 expression #8925

Merged
merged 27 commits into from
Aug 25, 2023

Conversation

gerashegalov
Copy link
Collaborator

@gerashegalov gerashegalov commented Aug 3, 2023

Contributes to #8511

POC only supports 10/16<->10/16 radix conversions, without overflow checks it's guaranteed to produce identical results to CPU only for

  • decimal strings not longer than 18 characters
  • hexadecimal strings not longer than 15 characters

Signed-off-by: Gera Shegalov [email protected]

Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
@gerashegalov gerashegalov added the task Work required that improves the product but is not user facing label Aug 3, 2023
@gerashegalov gerashegalov self-assigned this Aug 3, 2023
@gerashegalov gerashegalov linked an issue Aug 3, 2023 that may be closed by this pull request
4 tasks
@gerashegalov gerashegalov changed the title [WIP] Add GpuConv operator for the conv expression Add GpuConv operator for the conv expression Aug 12, 2023
Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
@gerashegalov gerashegalov mentioned this pull request Aug 17, 2023
4 tasks
@gerashegalov gerashegalov marked this pull request as ready for review August 18, 2023 18:55
@gerashegalov
Copy link
Collaborator Author

build

@gerashegalov gerashegalov changed the title Add GpuConv operator for the conv expression Add GpuConv operator for the conv expression [databricks] Aug 18, 2023
@gerashegalov
Copy link
Collaborator Author

build

case _ =>
willNotWorkOnGpu(because = "only literal 10 or 16 for from_base and to_base are supported")
}
if (SQLConf.get.ansiEnabled) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ANSI only shows up in 3.4.0+ https://issues.apache.org/jira/browse/SPARK-42427 and if the expression is in ANSI mode or not should come from expr not directly from SQLConf.get.ansiEnabled.

willNotWorkOnGpu(because = "only literal 10 or 16 for from_base and to_base are supported")
}
if (SQLConf.get.ansiEnabled) {
willNotWorkOnGpu(because = " the GPU has no overflow checking.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can enable this by default. Even in 3.1.x Spark checks for overflow when encoding the value, and will return -1 if it sees an overflow. We do not do that.

https://github.com/apache/spark/blob/61e034807dc555d7ceadc534fcb0c82d50fc8719/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala#L56-L59

The ANSI mode fix in 3.4.0 only added in an exception instead of returning -1.

The following was run on Spark 3.3.0

scala> val df = Seq("9223372036854775807", "-9223372036854775808", "9223372036854775808", "-9223372036854775809", "10223372036854775807", "-10223372036854775808", null).toDF
scala> df.repartition(1).selectExpr("conv(value, 10, 16)", "value").show(false)
+-------------------+---------------------+
|conv(value, 10, 16)|value                |
+-------------------+---------------------+
|7FFFFFFFFFFFFFFF   |9223372036854775807  |
|8000000000000000   |-9223372036854775808 |
|8000000000000000   |9223372036854775808  |
|7FFFFFFFFFFFFFFF   |-9223372036854775809 |
|8DE0B6B3A763FFFF   |10223372036854775807 |
|721F494C589C0000   |-10223372036854775808|
|null               |null                 |
+-------------------+---------------------+
scala> spark.conf.set("spark.rapids.sql.enabled", false)
scala> df.repartition(1).selectExpr("conv(value, 10, 16)", "value").show(false)
+-------------------+---------------------+
|conv(value, 10, 16)|value                |
+-------------------+---------------------+
|7FFFFFFFFFFFFFFF   |9223372036854775807  |
|FFFFFFFFFFFFFFFF   |-9223372036854775808 |
|8000000000000000   |9223372036854775808  |
|FFFFFFFFFFFFFFFF   |-9223372036854775809 |
|8DE0B6B3A763FFFF   |10223372036854775807 |
|FFFFFFFFFFFFFFFF   |-10223372036854775808|
|null               |null                 |
+-------------------+---------------------+

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware of the -1-equivalent output, i.e conversion to 18446744073709551615 representation as a string in to_base if to_base is positive and -1 if to_base is negative (which we already fallback to CPU).

>>> spark.createDataFrame([('-1',), ('18446744073709551615',), ('18446744073709551616',)], 'a string').selectExpr('conv(a, 10, 10)').show()
+--------------------+
|     conv(a, 10, 10)|
+--------------------+
|18446744073709551615|
|18446744073709551615|
|18446744073709551615|
+--------------------+

My reasoning is that since Spark does not allow distinguishing 18446744073709551615 as a result of the overflow check or based on the original data, it does not really matter. However, it's true that the customer may have a process in place making sure that 18446744073709551615 is uniquely due to an overflow, and filter out these by filtering the output != '18446744073709551615'.

So we can disable by default for unlimited StringType and safely enable of StringType(length) for lengths guaranteed not to overflow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that is probably the reason that Spark added in an ANSI mode because it really is ambiguous. Seeing FFFFFFFFFFFFFFFF on the output when converting to hex is ambiguous and you just don't know if an overflow happened or not without further processing.

For me I don't really want to enable this by default if it is only partially done. But I can see an argument for allowing it. @sameerz @mattf do you have an opinion if we should put this in with conv enabled by default while we work on a better long term solution? Would it be okay to put it in with conv disabled by default and some docs so users know how to enable it if there are potential incompatibilities?

@revans2
Copy link
Collaborator

revans2 commented Aug 21, 2023

I also ran some performance tests because I was nervous about the use of regular expressions to try and implement this. I am less concerned now. Not because the regular expressions are bad, but because the CPU code is really bad too.

I generated 1 - billion rows of two longs. One column I just cast to a string and the other column I converted to base 16, then wrote the data out to parquet.

I then ran the following to get a baseline for reading in the data an doing a min/max on it.

spark.time(spark.read.parquet("/data/tmp/CONV_IN").selectExpr("MIN(b)", "MAX(b)", "MIN(c)", "MAX(c)").show(false))

The median of 5 runs for the CPU was 56.607 seconds and the GPU was 10.487 seconds (the GPU a6000 is about 5.4x faster than my old 6 core 12 thread desktop CPU)

I then ran the following to understand the cost of conv

spark.time(spark.read.parquet("/data/tmp/CONV_IN").selectExpr("MIN(conv(b, 10, 16))", "MAX(conv(b, 16, 10))", "MIN(conv(c, 10, 16))", "MAX(conv(c, 10, 10))").show(false))

The CPU took 568.396 seconds (I only ran it once) and the GPU took 183.554 seconds (again I only ran it once the GPU was 100% utilized the entire query and it started to throttle from heat).

That results in a difference of 511.789 seconds for the CPU runs and 173.067 seconds for the GPU runs. So the GPU is about 3 times faster than the CPU for conv. Not great, but not so bad that I think we need to block it. A custom kernel should speed this up massively compared to the CPU because it is clear the CPU code is not well optimized.

@gerashegalov
Copy link
Collaborator Author

@revans2 Thank for doing the measurements. This PR is meant as a stepping stone to prevent CPU fallbacks for the cases that libcudf already can support. I will work on the custom kernel as a follow-on

Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
revans2
revans2 previously approved these changes Aug 23, 2023
@revans2
Copy link
Collaborator

revans2 commented Aug 23, 2023

build

Signed-off-by: Gera Shegalov <[email protected]>
revans2
revans2 previously approved these changes Aug 23, 2023
@gerashegalov
Copy link
Collaborator Author

build

Signed-off-by: Gera Shegalov <[email protected]>
@gerashegalov
Copy link
Collaborator Author

build

@gerashegalov gerashegalov changed the title Add GpuConv operator for the conv expression [databricks] Add GpuConv operator for the conv expression Aug 24, 2023
@gerashegalov
Copy link
Collaborator Author

build

1 similar comment
@pxLi
Copy link
Collaborator

pxLi commented Aug 25, 2023

build

@gerashegalov gerashegalov changed the title Add GpuConv operator for the conv expression Add GpuConv operator for the conv 10<->16 expression Aug 25, 2023
@gerashegalov gerashegalov merged commit f68d30f into NVIDIA:branch-23.10 Aug 25, 2023
26 of 27 checks passed
@gerashegalov gerashegalov deleted the gerashegalov/issue8511 branch August 25, 2023 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support conv function
3 participants