ODD Spark Adapter

Introduction

ODD Spark Adapter is a Spark Listener designed to send metadata and dependencies of Spark 3.3.1 jobs to platforms that are based on OpenDataDiscovery specification.

To learn more about OpenDataDiscovery and ODD Platform, please refer to project's landing and documentation pages.

Supported data sources

ODD Spark adapter v0.0.1 supports:

RDD low level jobs
Read/write from/to JDBC data sources
Read/write from/to Kafka topics (batch only)
Read/write from/to Snowflake tables
Read/write from/to S3 Delta tables

Limitations

As of now ODD Spark adapter doesn't support Spark structured streaming (in roadmap)
As of now ODD Spark adapter doesn't support Spark structured streaming (in roadmap)
As of now ODD Spark adapter supports Spark 3.3.1 only (in roadmap)

Setting up the ODD Spark adapter

ODD Spark Adapter is essentially a simple Spark Listener designed to gather metadata and inputs/outputs of Spark jobs and send it to the ODD Platform or any ODD based backend.

Download listener JAR

Available JAR files can be found in Releases

Configuration

spark.odd.host.url — URL of ODD Platform deployment
spark.odd.oddrn.key — Unique identifier of Spark cluster. Can be any string that uniquely defines target Spark cluster in the scope of user's data infrastructure.

Example of running Spark job with ODD Spark adapter

./spark-submit \
    --packages <needed packages for the Spark jobs> \
    --jars <path to the ODD Spark adapter JAR> \
    --conf "spark.odd.host.url=http://odd-platform:8080" \
    --conf "spark.odd.oddrn.key=unique_spark_cluster_key" \
    /jobs/simple-delta-lake.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ODD Spark Adapter

Introduction

Supported data sources

Limitations

Setting up the ODD Spark adapter

Download listener JAR

Configuration

Example of running Spark job with ODD Spark adapter

Files

README.md

Latest commit

History

README.md

File metadata and controls

ODD Spark Adapter

Introduction

Supported data sources

Limitations

Setting up the ODD Spark adapter

Download listener JAR

Configuration

Example of running Spark job with ODD Spark adapter