Tutorial "How to create an MLCube" (#164)

* Create create-mlcube.md Add tutorial create-mlcube.md. * Update create-mlcube.md formatting * Update create-mlcube.md remove video from tutorial. Create new video for this tutorial later. * Update create-mlcube.md formatting change. * add 'Tutorials' to Index for docs * tested version of how to make a mlcube tutorial * add formatting of line highlighted in blue * add use of cookiecutter to 'How to create an MLCube' tutorial * add cookiecutter installation in comments * add comment saying input to cookiecutter should not contain quotes
mlcommons · Dec 1, 2020 · b0a8f57 · b0a8f57
1 parent 3374539
commit b0a8f57
Show file tree

Hide file tree

Showing 2 changed files with 224 additions and 0 deletions.
diff --git a/docs/tutorials/create-mlcube.md b/docs/tutorials/create-mlcube.md
@@ -0,0 +1,222 @@
+# Tutorial Create an MLCube 
+Interested in getting started with MLCube? Follow the instructions in this tutorial.    
+## Step 1: SETUP   
+Get MLCube, MLCube examples and MLCube Templates, and CREATE a Python environment.
+```
+# You can clone the mlcube examples and templates from GtiHub
+git clone https://github.com/mlcommons/mlcube_examples
+# Create a python environment
+virtualenv -p python3 ./env && source ./env/bin/activate
+# Install mlcube, mlcube-docker and cookiecutter 
+pip install mlcube mlcube-docker cookiecutter 
+```
+
+## Step 2: CONFIGURE MLCUBE USING THE TEMPLATE FILES 
+Let's use the 'matmult' example, that we downloaded in the previous step, to illustrate how to make an MLCube. Matmul is a simple matrix multiply example written in Python with TensorFlow. 
+When you create an MLCube for your own model you will use your own code, data and dockerfile.
+
+```
+cd mlcube_examples
+# rename matmul reference implementaion from matmul to matmul_reference
+mv ./matmul ./matmul_reference
+# create a mlcube directory using mlcube template(note: do not use quotes in your input to cookiecutter): name = matmul, author = MLPerf Best Practices Working Group  
+cookiecutter https://github.com/mlcommons/mlcube_cookiecutter.git
+# copy the matmul.py,Dockerfile and requirements.txt to your mlcube_matmul/build directory
+cp -R  matmul_reference/build  matmul
+# copy input file for matmul to workspace directory
+cp -R  matmul_reference/workspace  matmul
+```
+
+Edit the template files 
+
+Start by looking at the mlcube.yaml file that has been generated by cookiecutter. 
+```
+cd ./matmul
+```
+
+Cookiecutter has modified the lines shown in **bold** in the mlcube.yaml file shown here:
+<pre><code> 
+# This YAML file marks a directory to be an MLCube directory. When running MLCubes with runners, MLCube path is
+# specified using `--mlcube` runner command line argument.
+# The most important parameters that are defined here are (1) name, (2) author and (3) list of MLCube tasks.
+schema_version: 1.0.0
+schema_type: mlcube_root
+
+# MLCube name (string). Replace it with your MLCube name (e.g. "matmul" as shown here).
+name: <strong>matmul</strong>
+# MLCube author (string). Replace it with your MLBox name (e.g. "MLPerf Best Practices Working Group").
+author: <strong>MLPerf Best Practices Working Group</strong>
+
+version: 0.1.0
+mlcube_spec_version: 0.1.0
+
+# List of MLCube tasks supported by this MLBox (list of strings). Every task:
+#    - Has a unique name (e.g. "download").
+#    - Is defined in a YAML file in the `tasks` sub-folder (e.g. "tasks/download.yaml").
+#    - Task name is passed to an MLBox implementation file as the first argument (e.g. "python mnist.py download ...").
+# Every task is described by lists of input and output parameters. Every parameter is a file system path (directory or
+# file) characterized by two fields - name and value.
+# By default, if a file system path is a relative path (i.e. does not start with `/`), it is considered to be relative
+# to the `workspace` sub-folder.
+# Once all tasks are listed below, create a YAML file for each task in the 'tasks' sub-folder and change them
+# appropriately.
+# NEXT: study `tasks/task_name.yaml`, note: in the case of matmul we only need one task.
+tasks:
+<strong>  - tasks/matmul.yaml</strong>
+</code></pre>
+
+
+Now we will look at file ./matmul/tasks/matmul.yaml.
+```
+cd ./tasks
+```
+Cookiecutter has modified the lines shown in **bold** in the matmul.yaml file shown here:
+
+<pre><code> 
+# This YAML file defines the task that this MLCube supports. A task is a piece of functionality that MLCube can run. Task
+# examples are `download data`, `pre-process data`, `train a model`, `test a model` etc. MLCube runtime invokes MLCube
+# entry point and provides (1) task name as the first argument, (2) task input/output parameters (--name=value) in no
+# particular order. Inputs, outputs or both can be empty lists. For instance, when MLCube runtime runs an MLCube task:
+#            python my_mlcube_entry_script.py download --data_dir=DATA_DIR_PATH --log_dir=LOG_DIR_PATH
+#    - `download` is the task name.
+#    - `data_dir` is the output parameter with value equal to DATA_DIR_PATH.
+#    - `log_dir` is the output parameter with value equal to LOG_DIR_PATH.
+# This file only defines parameters, and does not provide parameter values. This is internal MLCube file and is not
+# exposed to users via command line interface.
+schema_version: 1.0.0
+schema_type: mlcube_task
+
+# List of input parameters (list of dictionaries).
+inputs:
+   <strong> - name: parameters_file
+      type: file</strong> 
+
+# List of output parameters (list of dictionaries). Every parameter is a dictionary with two mandatory fields - `name`
+# and `type`. The `name` must have value that can be used as a command line parameter name (--data_dir, --log_dir). The
+# `type` is a categorical parameter that can be either `directory` or `file`. Every intput/output parameter is always
+# a file system path.
+# Only parameters with their types are defined in this file. Run configurations defined in the `run` sub-folder
+# associate parameter names and their values. There can be multiple run configurations for one task. One example is
+# 1-GPU and 8-GPU training configuration for some `train` task.
+# NEXT: study `run/task_name.yaml`.
+outputs:
+   <strong> - name: output_file</strong> 
+      <strong>type: file</strong> 
+</code></pre>
+
+Our input file shapes.yaml that we have copied previously into the mlcube workspace contains input parameters to set matrix 
+dimensions. We need to remove the automatically generated parameters file.
+```
+rm ../workspace/parameters_file.yaml
+```
+
+Now we will edit file ./matmul/run/matmul.yaml. 
+```
+cd ../run
+```
+
+The lines you need to edit are shown in **bold** in the matmul.yaml file shown here:  
+<pre><code>
+# A run configuration assigns values to task parameters. Since there can be multiple run configurations for one
+# task (i.e., 1-GPU and 8-GPU training), run configuration files do not necessarily have to have the same name as their
+# tasks. Three sections need to be updated in this file - `task_name`, `input_binding` and `output_binding`.
+# Users use task configuration files to ask MLCube runtime run specific task using `--task` command line argument.
+schema_type: mlcube_invoke
+schema_version: 1.0.0
+
+# Name of a task.
+# task_name: task_name
+task_name: matmul
+
+# Dictionary of input bindings (dictionary mapping strings to strings). Parameters must correspond to those in task
+# file (`inputs` section). If not parameters are provided, the binding section must be an empty dictionary.
+input_binding:
+        <strong>parameters_file: $WORKSPACE/shapes.yaml</strong> 
+
+# Dictionary of output bindings (dictionary mapping strings to strings). Parameters must correspond to those in task
+# file (`outputs` section). Every parameter is a file system path (directory or a file name). Paths can be absolute
+# (starting with `/`) or relative. Relative paths are assumed to be relative to MLCube `workspace` directory.
+# Alternatively, a special variable `$WORKSPACE` can be used to explicitly refer to the MLCube `workspace` directory.
+# MLCube root directory (`--mlcube`) and run configuration file (`--task`) define MLCube task to run. One step left is
+# to specify where MLCube runs - on a local machine, remote machine in the cloud etc. This is done by providing platform
+# configuration files located in the MLCube `platforms` sub-folder.
+# NEXT: study `platforms/docker.yaml`.
+output_binding:
+        <strong>output_file: $WORKSPACE/matmul_output.txt</strong> 
+
+</code></pre>
+
+
+
+
+Now we will edit file ./matmul/platforms/docker.yaml 
+
+```
+cd ../platforms
+```
+Edit the docker image name in docker.yaml.  Change "image: "mlcube/matmul:0.0.1" to "mlcommons/matmul:v1.0"
+<pre><code> 
+# Platform configuration files define where and how runners run MLCubes. This configuration file defines a Docker
+# runtime for MLCubes. One field need to be updated here - `container.image`. This platform file defines local docker
+# execution environment.
+# MLCube Docker runner uses image name to either `pull` or `build` a docker image. The rule is the following:
+#   - If the following file exists (`build/Dockerfile`), Docker image will be built.
+#   - Else, docker runner will pull a docker image with the specified name.
+# Users provide platform files using `--platform` command line argument.
+schema_type: mlcube_platform
+schema_version: 0.1.0
+
+platform:
+  name: "docker"
+  version: ">=18.01"
+container:   
+<strong>   image: "mlcommons/matmul:v1.0"</strong> 
+</code></pre>
+
+## Step 3. DEFINE A CONTAINER FOR YOUR MODEL WITH A DOCKERFILE
+You will need a docker image to create an MLCube.  We will use the Dockerfile for 'matmul' to create a docker container image:   
+<sub><sup><span style="color:blue">Note: the last line of the Dockerfile must be    
+"ENTRYPOINT ["python3", "/workspace/your_mlcube_name.py"]" as shown below.</span></sup></sub> 
+
+Now we will edit the my_mlcube/build/Dockerfile
+```
+cd ../build 
+```
+<pre><code> 
+# Sample Dockerfile for matmul (Matrix Multiply)
+FROM ubuntu:18.04
+MAINTAINER MLPerf MLBox Working Group
+
+WORKDIR /workspace
+
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+            software-properties-common \
+            python3-dev \
+            curl && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN curl -fSsL -O https://bootstrap.pypa.io/get-pip.py && \
+    python3 get-pip.py && \
+    rm get-pip.py
+
+COPY requirements.txt /requirements.txt
+RUN pip3 install --no-cache-dir -r /requirements.txt
+
+<strong>COPY matmul.py /workspace/matmul.py</strong>
+
+<strong>ENTRYPOINT ["python3", "/workspace/matmul.py"]</strong>
+</code></pre>
+
+## Step 4: BUILD THE DOCKER IMAGE
+```
+cd ..
+mlcube_docker configure --mlcube=. --platform=platforms/docker.yaml
+```
+
+## Step 5: TEST YOUR MLCUBE
+```
+mlcube_docker run --mlcube=. --platform=platforms/docker.yaml --task=run/matmul.yaml
+ls ./workspace
+cat ./workspace/matmul_output.txt
+```
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -9,6 +9,8 @@ nav:
     - Installation: getting-started/index.md
     - Hello World: getting-started/hello-world.md
     - MNIST: getting-started/mnist.md
+  - Tutorials:
+    - How to Create an MLCube: tutorials/create-mlcube.md
   - Runners:
     - Runners: runners/index.md
     - Docker Runner: runners/docker-runner.md