Skip to content

Data Science Pipeline with ML DSL

arodin edited this page Jun 30, 2020 · 1 revision

ML-DSL is an open source machine learning library developed to simplify data specialist’s experience of interaction with Cloud Platforms such as Amazon AWS and Google Cloud Platform. It lets data scientists, data analysts configure and execute ML/DS pipelines.

ML-DSL is property of Grid Dynamics International. It consumes Amazon Services including AWS S3, EMR, SageMaker and Google services such as Cloud Storage, Cloud Dataproc and Cloud AI.

Following features are available:

  • Configuring and executing spark jobs for data processing using Google Dataproc and Amazon EMR
  • Configuring and executing ML/DS pipelines for training, deployment models on Google AI Platform and Amazon SageMaker using ml-dsl API
  • Configuring and executing ML/DS pipelines for data processing and training, deployment models using Jupyter Notebook Magic functions.

A Jupyter notebook of example using ml-dsl for Google Cloud Platform has been provided for your convenience.

A Jupyter notebook of example using ml-dsl for Amazon has been provided for your convenience.

ML-DSL User Guide

Running spark jobs

Getting logs

Upgrading spark jobs

Train models

Deployment models

Getting predictions

ML-DSL API Reference