Skip to content

Commit adb45e5

Browse files
authored
Minified benchmarks documentation (#363)
Documentation pages (https://docs.mlcommons.org/mlcube/) for the following MLperf minified benchmarks: `Llama2`, `Stable Diffusion`, `3D Unet`, `ResNet`, `Bert` and `Object Detection`.
1 parent 37a213d commit adb45e5

File tree

8 files changed

+509
-0
lines changed

8 files changed

+509
-0
lines changed

docs/minified-benchmarks/3d-unet.md

+58
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# 3D Unet
2+
3+
The benchmark reference for 3D Unet can be found in this [link](https://github.com/mlcommons/training/tree/master/retired_benchmarks/unet3d/pytorch), and here is the PR for the minified benchmark implementation: [link](https://github.com/mlcommons/training/pull/695).
4+
5+
## Project setup
6+
7+
An important requirement is that you must have Docker installed.
8+
9+
```bash
10+
# Create Python environment and install MLCube Docker runner
11+
virtualenv -p python3 ./env && source ./env/bin/activate && pip install pip==24.0 && pip install mlcube-docker
12+
# Fetch the implementation from GitHub
13+
git clone https://github.com/mlcommons/training && cd ./training
14+
git fetch origin pull/695/head:feature/mlcube_3d_unet && git checkout feature/mlcube_3d_unet
15+
cd ./image_segmentation/pytorch/mlcube
16+
```
17+
18+
Inside the mlcube directory run the following command to check implemented tasks.
19+
20+
```shell
21+
mlcube describe
22+
```
23+
24+
### MLCube tasks
25+
26+
Download dataset.
27+
28+
```shell
29+
mlcube run --task=download_data -Pdocker.build_strategy=always
30+
```
31+
32+
Process dataset.
33+
34+
```shell
35+
mlcube run --task=process_data -Pdocker.build_strategy=always
36+
```
37+
38+
Train SSD.
39+
40+
```shell
41+
mlcube run --task=train -Pdocker.build_strategy=always
42+
```
43+
44+
### Execute the complete pipeline
45+
46+
You can execute the complete pipeline with one single command.
47+
48+
```shell
49+
mlcube run --task=download_data,process_data,train -Pdocker.build_strategy=always
50+
```
51+
52+
## Run a quick demo
53+
54+
You can run a quick demo that first downloads a tiny dataset and then executes a short training workload.
55+
56+
```shell
57+
mlcube run --task=download_demo,demo -Pdocker.build_strategy=always
58+
```

docs/minified-benchmarks/bert.md

+124
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Bert
2+
3+
The benchmark reference for Bert can be found in this [link](https://github.com/mlcommons/training/tree/master/language_model/tensorflow/bert), and here is the PR for the minified benchmark implementation: [link](https://github.com/mlcommons/training/pull/632).
4+
5+
## Project setup
6+
7+
An important requirement is that you must have Docker installed.
8+
9+
```bash
10+
# Create Python environment and install MLCube Docker runner
11+
virtualenv -p python3 ./env && source ./env/bin/activate && pip install pip==24.0 && pip install mlcube-docker
12+
# Fetch the implementation from GitHub
13+
git clone https://github.com/mlcommons/training && cd ./training/language_model/tensorflow/bert
14+
```
15+
16+
Go to mlcube directory and study what tasks MLCube implements.
17+
18+
```shell
19+
cd ./mlcube
20+
mlcube describe
21+
```
22+
23+
### Demo execution
24+
25+
These tasks will use a demo dataset to execute a faster training workload for a quick demo (~8 min):
26+
27+
```bash
28+
mlcube run --task=download_demo -Pdocker.build_strategy=always
29+
30+
mlcube run --task=demo -Pdocker.build_strategy=always
31+
```
32+
33+
It's also possible to execute the two tasks in one single instruction:
34+
35+
```bash
36+
mlcube run --task=download_demo,demo -Pdocker.build_strategy=always
37+
```
38+
39+
### MLCube tasks
40+
41+
Download dataset.
42+
43+
```shell
44+
mlcube run --task=download_data -Pdocker.build_strategy=always
45+
```
46+
47+
Process dataset.
48+
49+
```shell
50+
mlcube run --task=process_data -Pdocker.build_strategy=always
51+
```
52+
53+
Train SSD.
54+
55+
```shell
56+
mlcube run --task=train -Pdocker.build_strategy=always
57+
```
58+
59+
Run compliance checker.
60+
61+
```shell
62+
mlcube run --task=check_logs -Pdocker.build_strategy=always
63+
```
64+
65+
### Execute the complete pipeline
66+
67+
You can execute the complete pipeline with one single command.
68+
69+
```shell
70+
mlcube run --task=download_data,process_data,train,check_logs -Pdocker.build_strategy=always
71+
```
72+
73+
## TPU Training
74+
75+
For executing this benchmark using TPU you will need access to [Google Cloud Platform](https://cloud.google.com/), then you can create a project (Note: all the resources should be created in the same project) and after that, you will need to follow the next steps:
76+
77+
1. Create a TPU node
78+
79+
In the Google Cloud console, search for the Cloud TPU API page, then click Enable.
80+
81+
Then go to the virtual machine sections and select [TPUs](https://console.cloud.google.com/compute/tpus)
82+
83+
Select create TPU node, fill in all the needed parameters, the recommended TPU type in the [readme](../README.md#on-tpu-v3-128) is v3-128 and the recommended TPU software version is 2.4.0.
84+
85+
The 3 most important parameters you need to remember are: `project name`, `TPU name`, and `TPU Zone`.
86+
87+
After creating, click on the TPU name to see the TPU details, and copy the Service account (should int the format: <[email protected]>)
88+
89+
2. Create a Google Storage Bucket
90+
91+
Go to [Google Storage](https://console.cloud.google.com/storage/browser) and create a new Bucket, define the needed parameters.
92+
93+
In the bucket list select the checkbox for the bucket you just created, then click on permissions, after that click on add principal.
94+
95+
In the new principals field paste the Service account from step 1, and then for the roles select, Storage Legacy Bucket Owner, Storage Legacy Bucket Reader and Storage Legacy Bucket Writer. Then click on save, this will allow the TPU to save the checkpoints during training.
96+
97+
3. Create a VM instance
98+
99+
The idea is to create a virtual machine instance containing all the code we will execute using MLCube.
100+
101+
Go to [VM instances](https://console.cloud.google.com/compute/instances), then click on create instance and define all the needed parameters (No GPU needed).
102+
103+
**IMPORTANT:** In the section Identity and API access, check the option `Allow full access to all Cloud APIs`, this will allow the connection between this VM, the Cloud Storage Bucket and the TPU.
104+
105+
Start the VM, connect to it via SSH, then use this [tutorial](https://docs.docker.com/engine/install/debian/) to install Docker.
106+
107+
After installing Docker, clone the repo and install MLCube and follow the to install MLCube, then go to the path: `training/language_model/tensorflow/bert/mlcube`
108+
109+
There modify the file at `workspace/parameters.yaml` and replace it with your data for:
110+
111+
```yaml
112+
output_gs: your_gs_bucket_name
113+
tpu_name: your_tpu_instance_name
114+
tpu_zone: your_tpu_zone
115+
gcp_project: your_gcp_project
116+
```
117+
118+
After that run the command:
119+
120+
```shell
121+
mlcube run --task=train_tpu --mlcube=mlcube_tpu.yaml -Pdocker.build_strategy=always
122+
```
123+
124+
This will start the MLCube task that internally in the host VM will send a gRPC with all the data to the TPU through gRPC, then the TPU will get the code to execute and the information of the Cloud Storage Bucket data and will execute the training workload.
+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minified Benchmarks
2+
3+
## What is a Minified Benchmark?
4+
5+
A minified benchmark is a reduced version of a MLCommons training benchmark designed to be easily reproduced using MLCube. It simplifies the benchmarking process by scaling down the dataset and training duration, also it has a simple installation and reproduction process.
6+
7+
The main advantages of these minified benchmarks are:
8+
9+
- **Faster Execution**: Minified benchmarks are quicker to run (between 10 to 15 mintues), allowing for faster iteration and validation.
10+
- **Easier implementation**: By using MLCube users don't need to worry about installing everything from scratch.
11+
- **Reference preparation**: Minified benchmarks could be used as an introductory step for users interested in executing the MLCommons reference benchmarks.
12+
13+
## List of Minified Benchmarks
14+
15+
- [LLama 2](llama2.md)
16+
- [Stable Diffusion](stable-diffusion.md)
17+
- [3D Unet](3d-unet.md)
18+
- [ResNet](resnet.md)
19+
- [Bert](bert.md)
20+
- [Object Detection](object-detection.md)

docs/minified-benchmarks/llama2.md

+90
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# LLama 2
2+
3+
The benchmark reference for LLama 2 can be found in this [link](https://github.com/mlcommons/training/tree/master/llama2_70b_lora), and here is the PR for the minified benchmark implementation: [link](https://github.com/mlcommons/training/pull/749).
4+
5+
This video explains all the following steps:
6+
7+
[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/1Y9q-nltI8U/0.jpg)](https://youtu.be/1Y9q-nltI8U)
8+
9+
## Project setup
10+
11+
An important requirement is that you must have Docker installed.
12+
13+
```bash
14+
# Create Python environment and install MLCube Docker runner
15+
virtualenv -p python3 ./env && source ./env/bin/activate && pip install pip==24.0 && pip install mlcube-docker
16+
# Fetch the implementation from GitHub
17+
git clone https://github.com/mlcommons/training && cd ./training
18+
git fetch origin pull/749/head:feature/mlcube_llama2 && git checkout feature/mlcube_llama2
19+
cd ./llama2_70b_lora/mlcube
20+
```
21+
22+
Inside the mlcube directory run the following command to check implemented tasks.
23+
24+
```shell
25+
mlcube describe
26+
```
27+
28+
### Extra requirements
29+
30+
Install Rclone in your system, by following [these instructions](https://rclone.org/install/).
31+
32+
MLCommons hosts the model for download exclusively by MLCommons Members. You must first agree to the [confidentiality notice](https://docs.google.com/forms/d/e/1FAIpQLSc_8VIvRmXM3I8KQaYnKf7gy27Z63BBoI_I1u02f4lw6rBp3g/viewform).
33+
34+
When finishing the previous form, you will be redirected to a Drive folder containing a file called `CLI Download Instructions`, follow the instructions inside that file up to step: `#3 Authenticate Rclone with Google Drive`.
35+
36+
When finishing this step a configuration file for Rclone will contain the necessary data to download the dataset and models. To check where this file is located run the command:
37+
38+
```bash
39+
rclone config file
40+
```
41+
42+
**Default:** `~/.config/rclone/rclone.conf`
43+
44+
Finally copy that file inside the `workspace` folder that is located in the same path as this readme, it must have the name `rclone.conf`.
45+
46+
### MLCube tasks
47+
48+
* Core tasks:
49+
50+
Download dataset.
51+
52+
```shell
53+
mlcube run --task=download_data -Pdocker.build_strategy=always
54+
```
55+
56+
Train.
57+
58+
```shell
59+
mlcube run --task=train -Pdocker.build_strategy=always
60+
```
61+
62+
* Demo tasks:
63+
64+
Download demo dataset.
65+
66+
```shell
67+
mlcube run --task=download_demo -Pdocker.build_strategy=always
68+
```
69+
70+
Train demo.
71+
72+
```shell
73+
mlcube run --task=demo -Pdocker.build_strategy=always
74+
```
75+
76+
### Execute the complete pipeline
77+
78+
You can execute the complete pipeline with one single command.
79+
80+
* Core pipeline:
81+
82+
```shell
83+
mlcube run --task=download_data,train -Pdocker.build_strategy=always
84+
```
85+
86+
* Demo pipeline:
87+
88+
```shell
89+
mlcube run --task=download_demo,demo -Pdocker.build_strategy=always
90+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Object Detection (Maskrcnn)
2+
3+
The benchmark reference for Object Detection (Maskrcnn) can be found in this [link](https://github.com/mlcommons/training/tree/master/retired_benchmarks/maskrcnn), and here is the PR for the minified benchmark implementation: [link](https://github.com/mlcommons/training/pull/501).
4+
5+
### Project setup
6+
7+
```bash
8+
# Create Python environment and install MLCube Docker runner
9+
virtualenv -p python3 ./env && source ./env/bin/activate && pip install pip==24.0 && pip install mlcube-docker
10+
11+
# Fetch the Object Detection workload
12+
git clone https://github.com/mlcommons/training && cd ./training
13+
git fetch origin pull/501/head:feature/object_detection && git checkout feature/object_detection
14+
cd ./object_detection/mlcube
15+
```
16+
17+
### Dataset
18+
19+
The COCO dataset will be downloaded and extracted. Sizes of the dataset in each step:
20+
21+
| Dataset Step | MLCube Task | Format | Size |
22+
|--------------------------------|-------------------|----------------|----------|
23+
| Download (Compressed dataset) | download_data | Tar/Zip files | ~20.5 GB |
24+
| Extract (Uncompressed dataset) | download_data | Jpg/Json files | ~21.2 GB |
25+
| Total | (After all tasks) | All | ~41.7 GB |
26+
27+
### Tasks execution
28+
29+
Parameters are defined at these files:
30+
31+
* MLCube user parameters: mlcube/workspace/parameters.yaml
32+
* Project user parameters: pytorch/configs/e2e_mask_rcnn_R_50_FPN_1x.yaml
33+
* Project default parameters: pytorch/maskrcnn_benchmark/config/defaults.py
34+
35+
```bash
36+
# Download COCO dataset. Default path = /workspace/data
37+
mlcube run --task=download_data -Pdocker.build_strategy=always
38+
39+
# Run benchmark. Default paths = ./workspace/data
40+
mlcube run --task=train -Pdocker.build_strategy=always
41+
```
42+
43+
### Demo execution
44+
45+
These tasks will use a demo dataset (39M) to execute a faster training workload for a quick demo (~12 min):
46+
47+
```bash
48+
# Download subsampled dataset. Default path = /workspace/demo
49+
mlcube run --task=download_demo -Pdocker.build_strategy=always
50+
51+
# Run benchmark. Default paths = ./workspace/demo and ./workspace/demo_output
52+
mlcube run --task=demo -Pdocker.build_strategy=always
53+
```
54+
55+
It's also possible to execute the two tasks in one single instruction:
56+
57+
```bash
58+
mlcube run --task=download_demo,demo -Pdocker.build_strategy=always
59+
```
60+
61+
### Aditonal options
62+
63+
Parameters defined at **mculbe/mlcube.yaml** could be overridden using: `--param=input`
64+
65+
We are targeting pull-type installation, so MLCube images should be available on docker hub. If not, try this:
66+
67+
```bash
68+
mlcube run ... -Pdocker.build_strategy=always
69+
```

0 commit comments

Comments
 (0)