Skip to content

Commit

Permalink
Tutorial "How to create an MLCube" (#164)
Browse files Browse the repository at this point in the history
* Create create-mlcube.md

Add tutorial create-mlcube.md.

* Update create-mlcube.md

formatting

* Update create-mlcube.md

remove video from tutorial. Create new video for this tutorial later.

* Update create-mlcube.md

formatting change.

* add 'Tutorials' to Index for docs

* tested version of how to make a mlcube tutorial

* add formatting of line highlighted in blue

* add use of cookiecutter to 'How to create an MLCube' tutorial

* add cookiecutter installation in comments

* add comment saying input to cookiecutter should not contain quotes
  • Loading branch information
dfeddema authored Dec 1, 2020
1 parent 3374539 commit b0a8f57
Show file tree
Hide file tree
Showing 2 changed files with 224 additions and 0 deletions.
222 changes: 222 additions & 0 deletions docs/tutorials/create-mlcube.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
# Tutorial Create an MLCube
Interested in getting started with MLCube? Follow the instructions in this tutorial.
## Step 1: SETUP
Get MLCube, MLCube examples and MLCube Templates, and CREATE a Python environment.
```
# You can clone the mlcube examples and templates from GtiHub
git clone https://github.com/mlcommons/mlcube_examples
# Create a python environment
virtualenv -p python3 ./env && source ./env/bin/activate
# Install mlcube, mlcube-docker and cookiecutter
pip install mlcube mlcube-docker cookiecutter
```

## Step 2: CONFIGURE MLCUBE USING THE TEMPLATE FILES
Let's use the 'matmult' example, that we downloaded in the previous step, to illustrate how to make an MLCube. Matmul is a simple matrix multiply example written in Python with TensorFlow.
When you create an MLCube for your own model you will use your own code, data and dockerfile.

```
cd mlcube_examples
# rename matmul reference implementaion from matmul to matmul_reference
mv ./matmul ./matmul_reference
# create a mlcube directory using mlcube template(note: do not use quotes in your input to cookiecutter): name = matmul, author = MLPerf Best Practices Working Group
cookiecutter https://github.com/mlcommons/mlcube_cookiecutter.git
# copy the matmul.py,Dockerfile and requirements.txt to your mlcube_matmul/build directory
cp -R matmul_reference/build matmul
# copy input file for matmul to workspace directory
cp -R matmul_reference/workspace matmul
```

Edit the template files

Start by looking at the mlcube.yaml file that has been generated by cookiecutter.
```
cd ./matmul
```

Cookiecutter has modified the lines shown in **bold** in the mlcube.yaml file shown here:
<pre><code>
# This YAML file marks a directory to be an MLCube directory. When running MLCubes with runners, MLCube path is
# specified using `--mlcube` runner command line argument.
# The most important parameters that are defined here are (1) name, (2) author and (3) list of MLCube tasks.
schema_version: 1.0.0
schema_type: mlcube_root

# MLCube name (string). Replace it with your MLCube name (e.g. "matmul" as shown here).
name: <strong>matmul</strong>
# MLCube author (string). Replace it with your MLBox name (e.g. "MLPerf Best Practices Working Group").
author: <strong>MLPerf Best Practices Working Group</strong>

version: 0.1.0
mlcube_spec_version: 0.1.0

# List of MLCube tasks supported by this MLBox (list of strings). Every task:
# - Has a unique name (e.g. "download").
# - Is defined in a YAML file in the `tasks` sub-folder (e.g. "tasks/download.yaml").
# - Task name is passed to an MLBox implementation file as the first argument (e.g. "python mnist.py download ...").
# Every task is described by lists of input and output parameters. Every parameter is a file system path (directory or
# file) characterized by two fields - name and value.
# By default, if a file system path is a relative path (i.e. does not start with `/`), it is considered to be relative
# to the `workspace` sub-folder.
# Once all tasks are listed below, create a YAML file for each task in the 'tasks' sub-folder and change them
# appropriately.
# NEXT: study `tasks/task_name.yaml`, note: in the case of matmul we only need one task.
tasks:
<strong> - tasks/matmul.yaml</strong>
</code></pre>


Now we will look at file ./matmul/tasks/matmul.yaml.
```
cd ./tasks
```
Cookiecutter has modified the lines shown in **bold** in the matmul.yaml file shown here:

<pre><code>
# This YAML file defines the task that this MLCube supports. A task is a piece of functionality that MLCube can run. Task
# examples are `download data`, `pre-process data`, `train a model`, `test a model` etc. MLCube runtime invokes MLCube
# entry point and provides (1) task name as the first argument, (2) task input/output parameters (--name=value) in no
# particular order. Inputs, outputs or both can be empty lists. For instance, when MLCube runtime runs an MLCube task:
# python my_mlcube_entry_script.py download --data_dir=DATA_DIR_PATH --log_dir=LOG_DIR_PATH
# - `download` is the task name.
# - `data_dir` is the output parameter with value equal to DATA_DIR_PATH.
# - `log_dir` is the output parameter with value equal to LOG_DIR_PATH.
# This file only defines parameters, and does not provide parameter values. This is internal MLCube file and is not
# exposed to users via command line interface.
schema_version: 1.0.0
schema_type: mlcube_task

# List of input parameters (list of dictionaries).
inputs:
<strong> - name: parameters_file
type: file</strong>

# List of output parameters (list of dictionaries). Every parameter is a dictionary with two mandatory fields - `name`
# and `type`. The `name` must have value that can be used as a command line parameter name (--data_dir, --log_dir). The
# `type` is a categorical parameter that can be either `directory` or `file`. Every intput/output parameter is always
# a file system path.
# Only parameters with their types are defined in this file. Run configurations defined in the `run` sub-folder
# associate parameter names and their values. There can be multiple run configurations for one task. One example is
# 1-GPU and 8-GPU training configuration for some `train` task.
# NEXT: study `run/task_name.yaml`.
outputs:
<strong> - name: output_file</strong>
<strong>type: file</strong>
</code></pre>

Our input file shapes.yaml that we have copied previously into the mlcube workspace contains input parameters to set matrix
dimensions. We need to remove the automatically generated parameters file.
```
rm ../workspace/parameters_file.yaml
```

Now we will edit file ./matmul/run/matmul.yaml.
```
cd ../run
```

The lines you need to edit are shown in **bold** in the matmul.yaml file shown here:
<pre><code>
# A run configuration assigns values to task parameters. Since there can be multiple run configurations for one
# task (i.e., 1-GPU and 8-GPU training), run configuration files do not necessarily have to have the same name as their
# tasks. Three sections need to be updated in this file - `task_name`, `input_binding` and `output_binding`.
# Users use task configuration files to ask MLCube runtime run specific task using `--task` command line argument.
schema_type: mlcube_invoke
schema_version: 1.0.0

# Name of a task.
# task_name: task_name
task_name: matmul

# Dictionary of input bindings (dictionary mapping strings to strings). Parameters must correspond to those in task
# file (`inputs` section). If not parameters are provided, the binding section must be an empty dictionary.
input_binding:
<strong>parameters_file: $WORKSPACE/shapes.yaml</strong>

# Dictionary of output bindings (dictionary mapping strings to strings). Parameters must correspond to those in task
# file (`outputs` section). Every parameter is a file system path (directory or a file name). Paths can be absolute
# (starting with `/`) or relative. Relative paths are assumed to be relative to MLCube `workspace` directory.
# Alternatively, a special variable `$WORKSPACE` can be used to explicitly refer to the MLCube `workspace` directory.
# MLCube root directory (`--mlcube`) and run configuration file (`--task`) define MLCube task to run. One step left is
# to specify where MLCube runs - on a local machine, remote machine in the cloud etc. This is done by providing platform
# configuration files located in the MLCube `platforms` sub-folder.
# NEXT: study `platforms/docker.yaml`.
output_binding:
<strong>output_file: $WORKSPACE/matmul_output.txt</strong>

</code></pre>




Now we will edit file ./matmul/platforms/docker.yaml

```
cd ../platforms
```
Edit the docker image name in docker.yaml. Change "image: "mlcube/matmul:0.0.1" to "mlcommons/matmul:v1.0"
<pre><code>
# Platform configuration files define where and how runners run MLCubes. This configuration file defines a Docker
# runtime for MLCubes. One field need to be updated here - `container.image`. This platform file defines local docker
# execution environment.
# MLCube Docker runner uses image name to either `pull` or `build` a docker image. The rule is the following:
# - If the following file exists (`build/Dockerfile`), Docker image will be built.
# - Else, docker runner will pull a docker image with the specified name.
# Users provide platform files using `--platform` command line argument.
schema_type: mlcube_platform
schema_version: 0.1.0

platform:
name: "docker"
version: ">=18.01"
container:
<strong> image: "mlcommons/matmul:v1.0"</strong>
</code></pre>

## Step 3. DEFINE A CONTAINER FOR YOUR MODEL WITH A DOCKERFILE
You will need a docker image to create an MLCube. We will use the Dockerfile for 'matmul' to create a docker container image:
<sub><sup><span style="color:blue">Note: the last line of the Dockerfile must be
"ENTRYPOINT ["python3", "/workspace/your_mlcube_name.py"]" as shown below.</span></sup></sub>

Now we will edit the my_mlcube/build/Dockerfile
```
cd ../build
```
<pre><code>
# Sample Dockerfile for matmul (Matrix Multiply)
FROM ubuntu:18.04
MAINTAINER MLPerf MLBox Working Group

WORKDIR /workspace

RUN apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common \
python3-dev \
curl && \
rm -rf /var/lib/apt/lists/*

RUN curl -fSsL -O https://bootstrap.pypa.io/get-pip.py && \
python3 get-pip.py && \
rm get-pip.py

COPY requirements.txt /requirements.txt
RUN pip3 install --no-cache-dir -r /requirements.txt

<strong>COPY matmul.py /workspace/matmul.py</strong>

<strong>ENTRYPOINT ["python3", "/workspace/matmul.py"]</strong>
</code></pre>

## Step 4: BUILD THE DOCKER IMAGE
```
cd ..
mlcube_docker configure --mlcube=. --platform=platforms/docker.yaml
```

## Step 5: TEST YOUR MLCUBE
```
mlcube_docker run --mlcube=. --platform=platforms/docker.yaml --task=run/matmul.yaml
ls ./workspace
cat ./workspace/matmul_output.txt
```
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ nav:
- Installation: getting-started/index.md
- Hello World: getting-started/hello-world.md
- MNIST: getting-started/mnist.md
- Tutorials:
- How to Create an MLCube: tutorials/create-mlcube.md
- Runners:
- Runners: runners/index.md
- Docker Runner: runners/docker-runner.md
Expand Down

0 comments on commit b0a8f57

Please sign in to comment.