Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-34624][CORE] Exclude non-jar dependencies of ivy/maven packages
### What changes were proposed in this pull request? Exclude non-jar dependencies of the ivy/maven packages we want to resolve as our current dependency resolution code assumes artifacts to be jars. https://github.com/apache/spark/blob/17601e014c6ccb48958d35ffb04bedeac8cfc66a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1215 and https://github.com/apache/spark/blob/17601e014c6ccb48958d35ffb04bedeac8cfc66a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L318 ### Why are the changes needed? Some maven artifacts define non-jar dependencies. One such example is `hive-exec`'s dependency on the `pom` of `apache-curator` https://repo1.maven.org/maven2/org/apache/hive/hive-exec/2.3.8/hive-exec-2.3.8.pom Today trying to depend on such an artifact using `--packages` will print an error but continue without including the non-jar dependency. Doing the same using `spark.sql("ADD JAR ivy://org.apache.hive:hive-exec:2.3.8?exclude=org.pentaho:pentaho-aggdesigner-algorithm")` will cause a failure. Detailed stacktraces can be found in SPARK-34624. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test. Retried the same example in `spark-shell` which produced the stacktrace in the JIRA. Closes apache#31741 from shardulm94/add-jar-filter-poms. Authored-by: Shardul Mahadik <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
- Loading branch information