diff --git a/docs/tutorials/create-mlcube.md b/docs/tutorials/create-mlcube.md new file mode 100644 index 00000000..81d94fc9 --- /dev/null +++ b/docs/tutorials/create-mlcube.md @@ -0,0 +1,222 @@ +# Tutorial Create an MLCube +Interested in getting started with MLCube? Follow the instructions in this tutorial. +## Step 1: SETUP +Get MLCube, MLCube examples and MLCube Templates, and CREATE a Python environment. +``` +# You can clone the mlcube examples and templates from GtiHub +git clone https://github.com/mlcommons/mlcube_examples +# Create a python environment +virtualenv -p python3 ./env && source ./env/bin/activate +# Install mlcube, mlcube-docker and cookiecutter +pip install mlcube mlcube-docker cookiecutter +``` + +## Step 2: CONFIGURE MLCUBE USING THE TEMPLATE FILES +Let's use the 'matmult' example, that we downloaded in the previous step, to illustrate how to make an MLCube. Matmul is a simple matrix multiply example written in Python with TensorFlow. +When you create an MLCube for your own model you will use your own code, data and dockerfile. + +``` +cd mlcube_examples +# rename matmul reference implementaion from matmul to matmul_reference +mv ./matmul ./matmul_reference +# create a mlcube directory using mlcube template(note: do not use quotes in your input to cookiecutter): name = matmul, author = MLPerf Best Practices Working Group +cookiecutter https://github.com/mlcommons/mlcube_cookiecutter.git +# copy the matmul.py,Dockerfile and requirements.txt to your mlcube_matmul/build directory +cp -R matmul_reference/build matmul +# copy input file for matmul to workspace directory +cp -R matmul_reference/workspace matmul +``` + +Edit the template files + +Start by looking at the mlcube.yaml file that has been generated by cookiecutter. +``` +cd ./matmul +``` + +Cookiecutter has modified the lines shown in **bold** in the mlcube.yaml file shown here: +
+# This YAML file marks a directory to be an MLCube directory. When running MLCubes with runners, MLCube path is
+# specified using `--mlcube` runner command line argument.
+# The most important parameters that are defined here are (1) name, (2) author and (3) list of MLCube tasks.
+schema_version: 1.0.0
+schema_type: mlcube_root
+
+# MLCube name (string). Replace it with your MLCube name (e.g. "matmul" as shown here).
+name: matmul
+# MLCube author (string). Replace it with your MLBox name (e.g. "MLPerf Best Practices Working Group").
+author: MLPerf Best Practices Working Group
+
+version: 0.1.0
+mlcube_spec_version: 0.1.0
+
+# List of MLCube tasks supported by this MLBox (list of strings). Every task:
+# - Has a unique name (e.g. "download").
+# - Is defined in a YAML file in the `tasks` sub-folder (e.g. "tasks/download.yaml").
+# - Task name is passed to an MLBox implementation file as the first argument (e.g. "python mnist.py download ...").
+# Every task is described by lists of input and output parameters. Every parameter is a file system path (directory or
+# file) characterized by two fields - name and value.
+# By default, if a file system path is a relative path (i.e. does not start with `/`), it is considered to be relative
+# to the `workspace` sub-folder.
+# Once all tasks are listed below, create a YAML file for each task in the 'tasks' sub-folder and change them
+# appropriately.
+# NEXT: study `tasks/task_name.yaml`, note: in the case of matmul we only need one task.
+tasks:
+ - tasks/matmul.yaml
+
+
+
+Now we will look at file ./matmul/tasks/matmul.yaml.
+```
+cd ./tasks
+```
+Cookiecutter has modified the lines shown in **bold** in the matmul.yaml file shown here:
+
+
+# This YAML file defines the task that this MLCube supports. A task is a piece of functionality that MLCube can run. Task
+# examples are `download data`, `pre-process data`, `train a model`, `test a model` etc. MLCube runtime invokes MLCube
+# entry point and provides (1) task name as the first argument, (2) task input/output parameters (--name=value) in no
+# particular order. Inputs, outputs or both can be empty lists. For instance, when MLCube runtime runs an MLCube task:
+# python my_mlcube_entry_script.py download --data_dir=DATA_DIR_PATH --log_dir=LOG_DIR_PATH
+# - `download` is the task name.
+# - `data_dir` is the output parameter with value equal to DATA_DIR_PATH.
+# - `log_dir` is the output parameter with value equal to LOG_DIR_PATH.
+# This file only defines parameters, and does not provide parameter values. This is internal MLCube file and is not
+# exposed to users via command line interface.
+schema_version: 1.0.0
+schema_type: mlcube_task
+
+# List of input parameters (list of dictionaries).
+inputs:
+ - name: parameters_file
+ type: file
+
+# List of output parameters (list of dictionaries). Every parameter is a dictionary with two mandatory fields - `name`
+# and `type`. The `name` must have value that can be used as a command line parameter name (--data_dir, --log_dir). The
+# `type` is a categorical parameter that can be either `directory` or `file`. Every intput/output parameter is always
+# a file system path.
+# Only parameters with their types are defined in this file. Run configurations defined in the `run` sub-folder
+# associate parameter names and their values. There can be multiple run configurations for one task. One example is
+# 1-GPU and 8-GPU training configuration for some `train` task.
+# NEXT: study `run/task_name.yaml`.
+outputs:
+ - name: output_file
+ type: file
+
+
+Our input file shapes.yaml that we have copied previously into the mlcube workspace contains input parameters to set matrix
+dimensions. We need to remove the automatically generated parameters file.
+```
+rm ../workspace/parameters_file.yaml
+```
+
+Now we will edit file ./matmul/run/matmul.yaml.
+```
+cd ../run
+```
+
+The lines you need to edit are shown in **bold** in the matmul.yaml file shown here:
+
+# A run configuration assigns values to task parameters. Since there can be multiple run configurations for one
+# task (i.e., 1-GPU and 8-GPU training), run configuration files do not necessarily have to have the same name as their
+# tasks. Three sections need to be updated in this file - `task_name`, `input_binding` and `output_binding`.
+# Users use task configuration files to ask MLCube runtime run specific task using `--task` command line argument.
+schema_type: mlcube_invoke
+schema_version: 1.0.0
+
+# Name of a task.
+# task_name: task_name
+task_name: matmul
+
+# Dictionary of input bindings (dictionary mapping strings to strings). Parameters must correspond to those in task
+# file (`inputs` section). If not parameters are provided, the binding section must be an empty dictionary.
+input_binding:
+ parameters_file: $WORKSPACE/shapes.yaml
+
+# Dictionary of output bindings (dictionary mapping strings to strings). Parameters must correspond to those in task
+# file (`outputs` section). Every parameter is a file system path (directory or a file name). Paths can be absolute
+# (starting with `/`) or relative. Relative paths are assumed to be relative to MLCube `workspace` directory.
+# Alternatively, a special variable `$WORKSPACE` can be used to explicitly refer to the MLCube `workspace` directory.
+# MLCube root directory (`--mlcube`) and run configuration file (`--task`) define MLCube task to run. One step left is
+# to specify where MLCube runs - on a local machine, remote machine in the cloud etc. This is done by providing platform
+# configuration files located in the MLCube `platforms` sub-folder.
+# NEXT: study `platforms/docker.yaml`.
+output_binding:
+ output_file: $WORKSPACE/matmul_output.txt
+
+
+
+
+
+
+Now we will edit file ./matmul/platforms/docker.yaml
+
+```
+cd ../platforms
+```
+Edit the docker image name in docker.yaml. Change "image: "mlcube/matmul:0.0.1" to "mlcommons/matmul:v1.0"
+
+# Platform configuration files define where and how runners run MLCubes. This configuration file defines a Docker
+# runtime for MLCubes. One field need to be updated here - `container.image`. This platform file defines local docker
+# execution environment.
+# MLCube Docker runner uses image name to either `pull` or `build` a docker image. The rule is the following:
+# - If the following file exists (`build/Dockerfile`), Docker image will be built.
+# - Else, docker runner will pull a docker image with the specified name.
+# Users provide platform files using `--platform` command line argument.
+schema_type: mlcube_platform
+schema_version: 0.1.0
+
+platform:
+ name: "docker"
+ version: ">=18.01"
+container:
+ image: "mlcommons/matmul:v1.0"
+
+
+## Step 3. DEFINE A CONTAINER FOR YOUR MODEL WITH A DOCKERFILE
+You will need a docker image to create an MLCube. We will use the Dockerfile for 'matmul' to create a docker container image:
+Note: the last line of the Dockerfile must be
+"ENTRYPOINT ["python3", "/workspace/your_mlcube_name.py"]" as shown below.
+
+Now we will edit the my_mlcube/build/Dockerfile
+```
+cd ../build
+```
+
+# Sample Dockerfile for matmul (Matrix Multiply)
+FROM ubuntu:18.04
+MAINTAINER MLPerf MLBox Working Group
+
+WORKDIR /workspace
+
+RUN apt-get update && \
+ apt-get install -y --no-install-recommends \
+ software-properties-common \
+ python3-dev \
+ curl && \
+ rm -rf /var/lib/apt/lists/*
+
+RUN curl -fSsL -O https://bootstrap.pypa.io/get-pip.py && \
+ python3 get-pip.py && \
+ rm get-pip.py
+
+COPY requirements.txt /requirements.txt
+RUN pip3 install --no-cache-dir -r /requirements.txt
+
+COPY matmul.py /workspace/matmul.py
+
+ENTRYPOINT ["python3", "/workspace/matmul.py"]
+
+
+## Step 4: BUILD THE DOCKER IMAGE
+```
+cd ..
+mlcube_docker configure --mlcube=. --platform=platforms/docker.yaml
+```
+
+## Step 5: TEST YOUR MLCUBE
+```
+mlcube_docker run --mlcube=. --platform=platforms/docker.yaml --task=run/matmul.yaml
+ls ./workspace
+cat ./workspace/matmul_output.txt
+```
diff --git a/mkdocs.yml b/mkdocs.yml
index db0e24ab..5f06452c 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -9,6 +9,8 @@ nav:
- Installation: getting-started/index.md
- Hello World: getting-started/hello-world.md
- MNIST: getting-started/mnist.md
+ - Tutorials:
+ - How to Create an MLCube: tutorials/create-mlcube.md
- Runners:
- Runners: runners/index.md
- Docker Runner: runners/docker-runner.md