Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Databricks 14.3 Support [DO NOT MERGE] #11467

Draft
wants to merge 38 commits into
base: branch-24.12
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
0efcf6e
db 14.3
Sep 5, 2024
ed1abbf
release350db
Sep 5, 2024
1c83368
avro and parquet fix
Sep 6, 2024
04a85e4
more dependencies
razajafri Sep 6, 2024
60cf190
avro dependecy
razajafri Sep 10, 2024
6f9397f
move shims around
razajafri Sep 11, 2024
fdd4c03
More Shim changes
razajafri Sep 11, 2024
db70c02
upmerged
razajafri Sep 11, 2024
4c547db
Signing off
razajafri Sep 11, 2024
7c1ea3e
Fixed jar name in install deps
razajafri Sep 11, 2024
799a909
updated Scala 2.13 pom
razajafri Sep 16, 2024
8ea5a52
add a check for 3.5 in install_deps.py
razajafri Sep 16, 2024
a1df759
Removed unnecessary changes
razajafri Sep 16, 2024
5eff73e
Removed PartitionedFileUtilsShim
razajafri Sep 16, 2024
92c6f59
Updated PythonMapInArrowExec to MapInArrowExec
razajafri Sep 16, 2024
dafccf7
conf is private in sqlContext, use sessionState instead
razajafri Sep 16, 2024
32131b8
Moved NullOutputStreamShim
razajafri Sep 17, 2024
52d2418
sql-plugin building
razajafri Sep 17, 2024
383225c
First batch of delta lake changes
razajafri Sep 17, 2024
4fab4ba
Removed duplicate shims for 350db
razajafri Sep 18, 2024
4884080
delta lake support changes
razajafri Sep 18, 2024
413761f
More shims
razajafri Sep 19, 2024
3da0dc3
delta-lake changes for release 350db
razajafri Sep 23, 2024
62a3d90
Added ParquetCVShim and ShimParquetColumnVector
razajafri Sep 23, 2024
6fc8835
Shims created, building 341db successfully
razajafri Sep 23, 2024
e904927
CudfUnsafeRow changes for 350db
razajafri Sep 23, 2024
76c2fbc
Shim changes
razajafri Sep 24, 2024
f607d21
Removed the GpuOptimisticTransactionBase from common
razajafri Sep 24, 2024
341fff8
Organized imports
razajafri Sep 24, 2024
978778d
Reverted unnecessary change
razajafri Sep 24, 2024
0ed0c53
Fixed 341db build by adding shim and referring to the right module
razajafri Sep 24, 2024
1e690cf
Renamed ShimVectorizedColumnReader to RapidsVectorizedColumnReader to…
razajafri Sep 24, 2024
394604c
Don't use native dictionary for 350db for now
razajafri Sep 24, 2024
ee946a5
Updated Scala2.13 poms
razajafri Sep 25, 2024
9b68cd3
Fixed build for 340+
razajafri Sep 25, 2024
5709d4a
Add support for the Databricks 14.3 pre-merge pipeline
NvTimLiu Sep 26, 2024
8979789
Updated ShimServiceProvider to return the correct Shim version
razajafri Sep 26, 2024
9274700
Use the correct shim for decimal multiplication
razajafri Oct 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions aggregator/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -711,6 +711,23 @@
</dependency>
</dependencies>
</profile>
<profile>
<id>release350db</id>
<activation>
<property>
<name>buildver</name>
<value>350db</value>
</property>
</activation>
<dependencies>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-delta-spark341db_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<classifier>${spark.version.classifier}</classifier>
</dependency>
</dependencies>
</profile>
<profile>
<id>release351</id>
<activation>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
{"spark": "342"}
{"spark": "343"}
{"spark": "350"}
{"spark": "350db"}
{"spark": "351"}
{"spark": "352"}
spark-rapids-shim-json-lines ***/
Expand Down
19 changes: 14 additions & 5 deletions jenkins/databricks/install_deps.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@ def define_deps(spark_version, scala_version):
elif spark_version.startswith('3.4'):
spark_prefix = '----ws_3_4'
mvn_prefix = '--mvn'
elif spark_version.startswith('3.5'):
spark_prefix = '----ws_3_5'
mvn_prefix = '--mvn'
Comment on lines +45 to +47
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like a pretty stable pattern emerged for 3.3, 3.4, 3.5


spark_suffix = f'hive-{hive_version}__hadoop-{hadoop_version}_{scala_version}'

Expand Down Expand Up @@ -69,7 +72,7 @@ def define_deps(spark_version, scala_version):
Artifact('org.apache.spark', f'spark-core_{scala_version}',
f'{spark_prefix}--core--core-{spark_suffix}_deploy.jar'),
Artifact('org.apache.spark', f'spark-versions_{scala_version}',
f'spark--versions--*--shim_{scala_version}_deploy.jar'),
f'spark--versions--*--shim*_{scala_version}_deploy.jar'),
Artifact('org.apache.spark', f'databricks-versions_{scala_version}',
f'common--build-info--build-info-spark_*_{scala_version}_deploy.jar'),
# Spark Hive Patches
Expand Down Expand Up @@ -125,15 +128,15 @@ def define_deps(spark_version, scala_version):
Artifact('com.fasterxml.jackson.core', 'jackson-annotations',
f'{prefix_ws_sp_mvn_hadoop}--com.fasterxml.jackson.core--jackson-annotations--com.fasterxml.jackson.core__jackson-annotations__*.jar'),
Artifact('org.apache.spark', f'spark-avro_{scala_version}',
f'{spark_prefix}--vendor--avro--avro-*.jar'),
f'{prefix_ws_sp_mvn_hadoop}--org.apache.avro--avro--org.apache.avro*.jar'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really an unconditional change

Artifact('org.apache.avro', 'avro-mapred',
f'{prefix_ws_sp_mvn_hadoop}--org.apache.avro--avro-mapred--org.apache.avro__avro-mapred__*.jar'),
Artifact('org.apache.avro', 'avro',
f'{prefix_ws_sp_mvn_hadoop}--org.apache.avro--avro--org.apache.avro__avro__*.jar'),
]

# Parquet
if spark_version.startswith('3.4'):
if spark_version.startswith('3.4') or spark_version.startswith('3.5'):
deps += [
Artifact('org.apache.parquet', 'parquet-hadoop',
f'{spark_prefix}--third_party--parquet-mr--parquet-hadoop--parquet-hadoop-shaded--*--libparquet-hadoop-internal.jar'),
Expand Down Expand Up @@ -162,7 +165,7 @@ def define_deps(spark_version, scala_version):


# log4j-core
if spark_version.startswith('3.3') or spark_version.startswith('3.4'):
if spark_version.startswith('3.3') or spark_version.startswith('3.4') or spark_version.startswith('3.5'):
deps += Artifact('org.apache.logging.log4j', 'log4j-core',
f'{prefix_ws_sp_mvn_hadoop}--org.apache.logging.log4j--log4j-core--org.apache.logging.log4j__log4j-core__*.jar'),

Expand All @@ -172,14 +175,20 @@ def define_deps(spark_version, scala_version):
f'{prefix_ws_sp_mvn_hadoop}--org.scala-lang.modules--scala-parser-combinators_{scala_version}-*.jar')
]

if spark_version.startswith('3.4'):
if spark_version.startswith('3.4') or spark_version.startswith('3.5'):
deps += [
# Spark Internal Logging
Artifact('org.apache.spark', f'spark-common-utils_{scala_version}', f'{spark_prefix}--common--utils--common-utils-hive-2.3__hadoop-3.2_2.12_deploy.jar'),
# Spark SQL API
Artifact('org.apache.spark', f'spark-sql-api_{scala_version}', f'{spark_prefix}--sql--api--sql-api-hive-2.3__hadoop-3.2_2.12_deploy.jar')
]

if spark_version.startswith('3.5'):
deps += [
Artifact('org.scala-lang.modules', f'scala-collection-compat_{scala_version}',
f'{prefix_ws_sp_mvn_hadoop}--org.scala-lang.modules--scala-collection-compat_{scala_version}--org.scala-lang.modules__scala-collection-compat_{scala_version}__2.11.0.jar'),
Artifact('org.apache.avro', f'avro-connector', f'{spark_prefix}--connector--avro--avro-hive-2.3__hadoop-3.2_2.12_shaded---606136534--avro-unshaded-hive-2.3__hadoop-3.2_2.12_deploy.jar')
]

return deps

Expand Down
26 changes: 26 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -549,6 +549,31 @@
<module>delta-lake/delta-stub</module>
</modules>
</profile>
<profile>
<!-- Note Databricks requires 2 properties -Ddatabricks and -Dbuildver=350db -->
<id>release350db</id>
<activation>
<property>
<name>buildver</name>
<value>350db</value>
</property>
</activation>
<properties>
<!-- Downgrade scala plugin version due to: https://github.com/sbt/sbt/issues/4305 -->
<scala.plugin.version>3.4.4</scala.plugin.version>
<spark.version.classifier>spark350db</spark.version.classifier>
<spark.version>${spark350db.version}</spark.version>
<spark.test.version>${spark350db.version}</spark.test.version>
<hadoop.client.version>3.3.1</hadoop.client.version>
<rat.consoleOutput>true</rat.consoleOutput>
<parquet.hadoop.version>1.12.0</parquet.hadoop.version>
<iceberg.version>${spark330.iceberg.version}</iceberg.version>
</properties>
<modules>
<module>shim-deps/databricks</module>
<module>delta-lake/delta-spark341db</module>
</modules>
</profile>
<profile>
<id>release351</id>
<activation>
Expand Down Expand Up @@ -781,6 +806,7 @@
<spark330db.version>3.3.0-databricks</spark330db.version>
<spark332db.version>3.3.2-databricks</spark332db.version>
<spark341db.version>3.4.1-databricks</spark341db.version>
<spark350db.version>3.5.0-databricks</spark350db.version>
<spark350.version>3.5.0</spark350.version>
<spark351.version>3.5.1</spark351.version>
<spark352.version>3.5.2</spark352.version>
Expand Down
17 changes: 17 additions & 0 deletions scala2.13/aggregator/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -711,6 +711,23 @@
</dependency>
</dependencies>
</profile>
<profile>
<id>release350db</id>
<activation>
<property>
<name>buildver</name>
<value>350db</value>
</property>
</activation>
<dependencies>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-delta-spark341db_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<classifier>${spark.version.classifier}</classifier>
</dependency>
</dependencies>
</profile>
<profile>
<id>release351</id>
<activation>
Expand Down
26 changes: 26 additions & 0 deletions scala2.13/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -549,6 +549,31 @@
<module>delta-lake/delta-stub</module>
</modules>
</profile>
<profile>
<!-- Note Databricks requires 2 properties -Ddatabricks and -Dbuildver=350db -->
<id>release350db</id>
<activation>
<property>
<name>buildver</name>
<value>350db</value>
</property>
</activation>
<properties>
<!-- Downgrade scala plugin version due to: https://github.com/sbt/sbt/issues/4305 -->
<scala.plugin.version>3.4.4</scala.plugin.version>
<spark.version.classifier>spark350db</spark.version.classifier>
<spark.version>${spark350db.version}</spark.version>
<spark.test.version>${spark350db.version}</spark.test.version>
<hadoop.client.version>3.3.1</hadoop.client.version>
<rat.consoleOutput>true</rat.consoleOutput>
<parquet.hadoop.version>1.12.0</parquet.hadoop.version>
<iceberg.version>${spark330.iceberg.version}</iceberg.version>
</properties>
<modules>
<module>shim-deps/databricks</module>
<module>delta-lake/delta-spark341db</module>
</modules>
</profile>
<profile>
<id>release351</id>
<activation>
Expand Down Expand Up @@ -781,6 +806,7 @@
<spark330db.version>3.3.0-databricks</spark330db.version>
<spark332db.version>3.3.2-databricks</spark332db.version>
<spark341db.version>3.4.1-databricks</spark341db.version>
<spark350db.version>3.5.0-databricks</spark350db.version>
<spark350.version>3.5.0</spark350.version>
<spark351.version>3.5.1</spark351.version>
<spark352.version>3.5.2</spark352.version>
Expand Down
55 changes: 54 additions & 1 deletion scala2.13/shim-deps/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,59 @@
</dependency>
</dependencies>
</profile>
<profile>
<id>release350db</id>
<activation>
<property>
<name>buildver</name>
<value>350db</value>
</property>
</activation>
<dependencies>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-format-internal_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-common-utils_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-api_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>shaded.parquet.org.apache.thrift</groupId>
<artifactId>shaded-parquet-thrift_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-connector</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.scala-lang.modules</groupId>
<artifactId>cala-collection-compat_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
</profile>
<profile>
<id>dbdeps</id>
<activation>
Expand Down Expand Up @@ -194,4 +247,4 @@
</dependencies>
</profile>
</profiles>
</project>
</project>
55 changes: 54 additions & 1 deletion shim-deps/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,59 @@
</dependency>
</dependencies>
</profile>
<profile>
<id>release350db</id>
<activation>
<property>
<name>buildver</name>
<value>350db</value>
</property>
</activation>
<dependencies>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-format-internal_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-common-utils_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-api_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>shaded.parquet.org.apache.thrift</groupId>
<artifactId>shaded-parquet-thrift_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-connector</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.scala-lang.modules</groupId>
<artifactId>scala-collection-compat_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
</profile>
<profile>
<id>dbdeps</id>
<activation>
Expand Down Expand Up @@ -194,4 +247,4 @@
</dependencies>
</profile>
</profiles>
</project>
</project>
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,8 @@ object GpuPartitioningUtils extends SQLConfHelper {
typeInference = sparkSession.sessionState.conf.partitionColumnTypeInferenceEnabled,
basePaths = basePaths,
userSpecifiedSchema = userSpecifiedSchema,
caseSensitive = sparkSession.sqlContext.conf.caseSensitiveAnalysis,
validatePartitionColumns = sparkSession.sqlContext.conf.validatePartitionColumns,
caseSensitive = sparkSession.sessionState.conf.caseSensitiveAnalysis,
validatePartitionColumns = sparkSession.sessionState.conf.validatePartitionColumns,
timeZoneId = timeZoneId)
(parsed, anyReplacedBase)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ abstract class GpuDataSourceBase(

// This is a non-streaming file based datasource.
case (format: FileFormat, _) =>
val useCatalogFileIndex = sparkSession.sqlContext.conf.manageFilesourcePartitions &&
val useCatalogFileIndex = sparkSession.sessionState.conf.manageFilesourcePartitions &&
catalogTable.isDefined && catalogTable.get.tracksPartitionsInCatalog &&
catalogTable.get.partitionColumnNames.nonEmpty
val (fileCatalog, dataSchema, partitionSchema) = if (useCatalogFileIndex) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ case class GpuInMemoryTableScanExec(
override def outputOrdering: Seq[SortOrder] =
relation.cachedPlan.outputOrdering.map(updateAttribute(_).asInstanceOf[SortOrder])

lazy val enableAccumulatorsForTest: Boolean = sparkSession.sqlContext
lazy val enableAccumulatorsForTest: Boolean = sparkSession.sessionState
.conf.inMemoryTableScanStatisticsEnabled

// Accumulators used for testing purposes
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -715,7 +715,7 @@ object InternalColumnarRddConverter extends Logging {
val b = batch.getOrElse({
// We have to fall back to doing a slow transition.
val converters = new GpuExternalRowToColumnConverter(schema)
val conf = new RapidsConf(df.sqlContext.conf)
val conf = new RapidsConf(df.sqlContext.sparkSession.sessionState.conf)
val goal = TargetSize(conf.gpuTargetBatchSizeBytes)
input.mapPartitions { rowIter =>
new ExternalRowToColumnarIterator(rowIter, schema, goal, converters)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ object TrampolineUtil {
}

def getSparkConf(spark: SparkSession): SQLConf = {
spark.sqlContext.conf
spark.sessionState.conf
}

def setExecutorEnv(sc: SparkContext, key: String, value: String): Unit = {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
{"spark": "342"}
{"spark": "343"}
{"spark": "350"}
{"spark": "350db"}
{"spark": "351"}
{"spark": "352"}
{"spark": "400"}
Expand Down
Loading
Loading