Skip to content
forked from openucx/sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

License

Notifications You must be signed in to change notification settings

yosefe/sparkucx

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

SparkUCX ShuffleManager Plugin

SparkUCX is a high performance ShuffleManager plugin for Apache Spark that uses RDMA and other high performance transport, supported by UCX when performing Shuffle data transfers in Spark jobs.

This open-source project is developed, maintained and supported by UCF consortium.

Runtime requirements

  • Apache Spark 2.2.0/2.3.0/2.4.0
  • Java 8+
  • An RDMA-supported network, e.g. RoCE or Infiniband

Installation

Obtain SparkUCX

Please use the "Releases" page to download pre-built binaries.
If you would like to build the project yourself, please refer to the "Build" section below.

ucx binaries must be in java.library.path on every Spark Master and Worker (usually in /usr/lib). It can be obtained by installing latest version of Mellanox OFED or following ucx build instruction.

Configuration

Provide Spark the location of the SparkUCX plugin jars and ucx shared binaries by using the extraClassPath option.

spark.driver.extraClassPath     /path/to/SparkUCX/spark-ucx-1.0-for-spark-2.4.0-jar-with-dependencies.jar:/PATH/TO/UCX/LIB
spark.executor.extraClassPath   /path/to/SparkUCX/spark-ucx-1.0-for-spark-2.4.0-jar-with-dependencies.jar:/PATH/TO/UCX/LIB

Add UCX shared binaries to java.library.path for Spark driver and executors:

spark.driver.extraJavaOptions      -Djava.library.path=/PATH/TO/UCX/LIB
spark.executor.extraJavaOptions    -Djava.library.path=/PATH/TO/UCX/LIB

Running

To enable the SparkUCX Shuffle Manager plugin, add the following configuration:

spark.shuffle.manager   org.apache.spark.shuffle.UcxShuffleManager

Build

Building the SparkUCX plugin requires Apache Maven and Java 8+

  1. Install jucx - java bindings over ucx

  2. Obtain a clone of SparkUCX

  3. Build the plugin for your Spark version (either 2.2.0, 2.3.0, 2.4.0), e.g. for Spark 2.4.0:

mvn -DskipTests clean package -Pspark-2.4.0

About

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published