-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Tutorial "How to create an MLCube" (#164)
* Create create-mlcube.md Add tutorial create-mlcube.md. * Update create-mlcube.md formatting * Update create-mlcube.md remove video from tutorial. Create new video for this tutorial later. * Update create-mlcube.md formatting change. * add 'Tutorials' to Index for docs * tested version of how to make a mlcube tutorial * add formatting of line highlighted in blue * add use of cookiecutter to 'How to create an MLCube' tutorial * add cookiecutter installation in comments * add comment saying input to cookiecutter should not contain quotes
- Loading branch information
Showing
2 changed files
with
224 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,222 @@ | ||
# Tutorial Create an MLCube | ||
Interested in getting started with MLCube? Follow the instructions in this tutorial. | ||
## Step 1: SETUP | ||
Get MLCube, MLCube examples and MLCube Templates, and CREATE a Python environment. | ||
``` | ||
# You can clone the mlcube examples and templates from GtiHub | ||
git clone https://github.com/mlcommons/mlcube_examples | ||
# Create a python environment | ||
virtualenv -p python3 ./env && source ./env/bin/activate | ||
# Install mlcube, mlcube-docker and cookiecutter | ||
pip install mlcube mlcube-docker cookiecutter | ||
``` | ||
|
||
## Step 2: CONFIGURE MLCUBE USING THE TEMPLATE FILES | ||
Let's use the 'matmult' example, that we downloaded in the previous step, to illustrate how to make an MLCube. Matmul is a simple matrix multiply example written in Python with TensorFlow. | ||
When you create an MLCube for your own model you will use your own code, data and dockerfile. | ||
|
||
``` | ||
cd mlcube_examples | ||
# rename matmul reference implementaion from matmul to matmul_reference | ||
mv ./matmul ./matmul_reference | ||
# create a mlcube directory using mlcube template(note: do not use quotes in your input to cookiecutter): name = matmul, author = MLPerf Best Practices Working Group | ||
cookiecutter https://github.com/mlcommons/mlcube_cookiecutter.git | ||
# copy the matmul.py,Dockerfile and requirements.txt to your mlcube_matmul/build directory | ||
cp -R matmul_reference/build matmul | ||
# copy input file for matmul to workspace directory | ||
cp -R matmul_reference/workspace matmul | ||
``` | ||
|
||
Edit the template files | ||
|
||
Start by looking at the mlcube.yaml file that has been generated by cookiecutter. | ||
``` | ||
cd ./matmul | ||
``` | ||
|
||
Cookiecutter has modified the lines shown in **bold** in the mlcube.yaml file shown here: | ||
<pre><code> | ||
# This YAML file marks a directory to be an MLCube directory. When running MLCubes with runners, MLCube path is | ||
# specified using `--mlcube` runner command line argument. | ||
# The most important parameters that are defined here are (1) name, (2) author and (3) list of MLCube tasks. | ||
schema_version: 1.0.0 | ||
schema_type: mlcube_root | ||
|
||
# MLCube name (string). Replace it with your MLCube name (e.g. "matmul" as shown here). | ||
name: <strong>matmul</strong> | ||
# MLCube author (string). Replace it with your MLBox name (e.g. "MLPerf Best Practices Working Group"). | ||
author: <strong>MLPerf Best Practices Working Group</strong> | ||
|
||
version: 0.1.0 | ||
mlcube_spec_version: 0.1.0 | ||
|
||
# List of MLCube tasks supported by this MLBox (list of strings). Every task: | ||
# - Has a unique name (e.g. "download"). | ||
# - Is defined in a YAML file in the `tasks` sub-folder (e.g. "tasks/download.yaml"). | ||
# - Task name is passed to an MLBox implementation file as the first argument (e.g. "python mnist.py download ..."). | ||
# Every task is described by lists of input and output parameters. Every parameter is a file system path (directory or | ||
# file) characterized by two fields - name and value. | ||
# By default, if a file system path is a relative path (i.e. does not start with `/`), it is considered to be relative | ||
# to the `workspace` sub-folder. | ||
# Once all tasks are listed below, create a YAML file for each task in the 'tasks' sub-folder and change them | ||
# appropriately. | ||
# NEXT: study `tasks/task_name.yaml`, note: in the case of matmul we only need one task. | ||
tasks: | ||
<strong> - tasks/matmul.yaml</strong> | ||
</code></pre> | ||
|
||
|
||
Now we will look at file ./matmul/tasks/matmul.yaml. | ||
``` | ||
cd ./tasks | ||
``` | ||
Cookiecutter has modified the lines shown in **bold** in the matmul.yaml file shown here: | ||
|
||
<pre><code> | ||
# This YAML file defines the task that this MLCube supports. A task is a piece of functionality that MLCube can run. Task | ||
# examples are `download data`, `pre-process data`, `train a model`, `test a model` etc. MLCube runtime invokes MLCube | ||
# entry point and provides (1) task name as the first argument, (2) task input/output parameters (--name=value) in no | ||
# particular order. Inputs, outputs or both can be empty lists. For instance, when MLCube runtime runs an MLCube task: | ||
# python my_mlcube_entry_script.py download --data_dir=DATA_DIR_PATH --log_dir=LOG_DIR_PATH | ||
# - `download` is the task name. | ||
# - `data_dir` is the output parameter with value equal to DATA_DIR_PATH. | ||
# - `log_dir` is the output parameter with value equal to LOG_DIR_PATH. | ||
# This file only defines parameters, and does not provide parameter values. This is internal MLCube file and is not | ||
# exposed to users via command line interface. | ||
schema_version: 1.0.0 | ||
schema_type: mlcube_task | ||
|
||
# List of input parameters (list of dictionaries). | ||
inputs: | ||
<strong> - name: parameters_file | ||
type: file</strong> | ||
|
||
# List of output parameters (list of dictionaries). Every parameter is a dictionary with two mandatory fields - `name` | ||
# and `type`. The `name` must have value that can be used as a command line parameter name (--data_dir, --log_dir). The | ||
# `type` is a categorical parameter that can be either `directory` or `file`. Every intput/output parameter is always | ||
# a file system path. | ||
# Only parameters with their types are defined in this file. Run configurations defined in the `run` sub-folder | ||
# associate parameter names and their values. There can be multiple run configurations for one task. One example is | ||
# 1-GPU and 8-GPU training configuration for some `train` task. | ||
# NEXT: study `run/task_name.yaml`. | ||
outputs: | ||
<strong> - name: output_file</strong> | ||
<strong>type: file</strong> | ||
</code></pre> | ||
|
||
Our input file shapes.yaml that we have copied previously into the mlcube workspace contains input parameters to set matrix | ||
dimensions. We need to remove the automatically generated parameters file. | ||
``` | ||
rm ../workspace/parameters_file.yaml | ||
``` | ||
|
||
Now we will edit file ./matmul/run/matmul.yaml. | ||
``` | ||
cd ../run | ||
``` | ||
|
||
The lines you need to edit are shown in **bold** in the matmul.yaml file shown here: | ||
<pre><code> | ||
# A run configuration assigns values to task parameters. Since there can be multiple run configurations for one | ||
# task (i.e., 1-GPU and 8-GPU training), run configuration files do not necessarily have to have the same name as their | ||
# tasks. Three sections need to be updated in this file - `task_name`, `input_binding` and `output_binding`. | ||
# Users use task configuration files to ask MLCube runtime run specific task using `--task` command line argument. | ||
schema_type: mlcube_invoke | ||
schema_version: 1.0.0 | ||
|
||
# Name of a task. | ||
# task_name: task_name | ||
task_name: matmul | ||
|
||
# Dictionary of input bindings (dictionary mapping strings to strings). Parameters must correspond to those in task | ||
# file (`inputs` section). If not parameters are provided, the binding section must be an empty dictionary. | ||
input_binding: | ||
<strong>parameters_file: $WORKSPACE/shapes.yaml</strong> | ||
|
||
# Dictionary of output bindings (dictionary mapping strings to strings). Parameters must correspond to those in task | ||
# file (`outputs` section). Every parameter is a file system path (directory or a file name). Paths can be absolute | ||
# (starting with `/`) or relative. Relative paths are assumed to be relative to MLCube `workspace` directory. | ||
# Alternatively, a special variable `$WORKSPACE` can be used to explicitly refer to the MLCube `workspace` directory. | ||
# MLCube root directory (`--mlcube`) and run configuration file (`--task`) define MLCube task to run. One step left is | ||
# to specify where MLCube runs - on a local machine, remote machine in the cloud etc. This is done by providing platform | ||
# configuration files located in the MLCube `platforms` sub-folder. | ||
# NEXT: study `platforms/docker.yaml`. | ||
output_binding: | ||
<strong>output_file: $WORKSPACE/matmul_output.txt</strong> | ||
|
||
</code></pre> | ||
|
||
|
||
|
||
|
||
Now we will edit file ./matmul/platforms/docker.yaml | ||
|
||
``` | ||
cd ../platforms | ||
``` | ||
Edit the docker image name in docker.yaml. Change "image: "mlcube/matmul:0.0.1" to "mlcommons/matmul:v1.0" | ||
<pre><code> | ||
# Platform configuration files define where and how runners run MLCubes. This configuration file defines a Docker | ||
# runtime for MLCubes. One field need to be updated here - `container.image`. This platform file defines local docker | ||
# execution environment. | ||
# MLCube Docker runner uses image name to either `pull` or `build` a docker image. The rule is the following: | ||
# - If the following file exists (`build/Dockerfile`), Docker image will be built. | ||
# - Else, docker runner will pull a docker image with the specified name. | ||
# Users provide platform files using `--platform` command line argument. | ||
schema_type: mlcube_platform | ||
schema_version: 0.1.0 | ||
|
||
platform: | ||
name: "docker" | ||
version: ">=18.01" | ||
container: | ||
<strong> image: "mlcommons/matmul:v1.0"</strong> | ||
</code></pre> | ||
|
||
## Step 3. DEFINE A CONTAINER FOR YOUR MODEL WITH A DOCKERFILE | ||
You will need a docker image to create an MLCube. We will use the Dockerfile for 'matmul' to create a docker container image: | ||
<sub><sup><span style="color:blue">Note: the last line of the Dockerfile must be | ||
"ENTRYPOINT ["python3", "/workspace/your_mlcube_name.py"]" as shown below.</span></sup></sub> | ||
|
||
Now we will edit the my_mlcube/build/Dockerfile | ||
``` | ||
cd ../build | ||
``` | ||
<pre><code> | ||
# Sample Dockerfile for matmul (Matrix Multiply) | ||
FROM ubuntu:18.04 | ||
MAINTAINER MLPerf MLBox Working Group | ||
|
||
WORKDIR /workspace | ||
|
||
RUN apt-get update && \ | ||
apt-get install -y --no-install-recommends \ | ||
software-properties-common \ | ||
python3-dev \ | ||
curl && \ | ||
rm -rf /var/lib/apt/lists/* | ||
|
||
RUN curl -fSsL -O https://bootstrap.pypa.io/get-pip.py && \ | ||
python3 get-pip.py && \ | ||
rm get-pip.py | ||
|
||
COPY requirements.txt /requirements.txt | ||
RUN pip3 install --no-cache-dir -r /requirements.txt | ||
|
||
<strong>COPY matmul.py /workspace/matmul.py</strong> | ||
|
||
<strong>ENTRYPOINT ["python3", "/workspace/matmul.py"]</strong> | ||
</code></pre> | ||
|
||
## Step 4: BUILD THE DOCKER IMAGE | ||
``` | ||
cd .. | ||
mlcube_docker configure --mlcube=. --platform=platforms/docker.yaml | ||
``` | ||
|
||
## Step 5: TEST YOUR MLCUBE | ||
``` | ||
mlcube_docker run --mlcube=. --platform=platforms/docker.yaml --task=run/matmul.yaml | ||
ls ./workspace | ||
cat ./workspace/matmul_output.txt | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters