The submitted results were obtained on two cards (Q2) of a R282-Z93 server with five cards (Q5). The reproduced results were obtained on a similar server. The main difference between the servers was the amount of RAM: 512G (64G x8) vs 128 (32G x4).
Workload | Results | Offline Accuracy | Offline Performance | SingleStream Accuracy | SingleStream Performance | MultiStream Accuracy | MultiStream Performance |
---|---|---|---|---|---|---|---|
ResNet50 | Submitted | 75.956 | 46,361.40 | 75.956 | 0.34 | 75.956 | 0.64 |
ResNet50 | Reproduced | 75.956 | 45,537.80 | 75.956 | 0.33 | 75.956 | 0.59 |
SSD-ResNet34 | Submitted | 19.831 | 885.04 | 19.831 | 8.73 | 19.831 | 28.03 |
SSD-ResNet34 | Reproduced | 19.831 | 881.31 | 19.831 | 7.06 | 19.831 | 25.01 |
SSD-MobileNet | Submitted | 23.160 | 38,630.30 | 23.160 | 0.68 | 23.160 | 1.52 |
SSD-MobileNet | Reproduced | 23.160 | 38,434.90 | 23.160 | 0.54 | 23.160 | 1.09 |
BERT-99 | Submitted | 90.363 | 1,437.71 | 90.332 | 10.25 | N/A | N/A |
BERT-99 | Reproduced | 90.360 | 1,415.18 | 90.332 | 10.04 | N/A | N/A |
The instructions below largely follow the Docker README, taking note of any important differences in the expected output.
uname -a
Linux dyson 5.4.1-1.el7.elrepo.x86_64 #1 SMP Fri Nov 29 10:21:13 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
/opt/qti-aic/tools/qaic-version-util
platform:AIC.1.6.80 apps:AIC.1.6.80 factory:not found
tree /local/mnt/workspace/ -L 2
/local/mnt/workspace/ ├── auditor ├── datasets │ └── imagenet ├── docker [error opening dir] └── sdks ├── qaic-apps-1.6.80.zip └── qaic-platform-sdk-1.6.80.zip5 directories, 2 files
sudo useradd auditor
sudo passwd auditor
sudo usermod -aG qaic,root,wheel,docker auditor
sudo mkdir /local/mnt/workspace/auditor
sudo chown auditor:qaic /local/mnt/workspace/auditor
ssh auditor@localhost
sudo su
export PYTHON_VERSION=3.8.13
cd /usr/src \
&& wget https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz \
&& tar xzf Python-${PYTHON_VERSION}.tgz \
&& rm -f Python-${PYTHON_VERSION}.tgz \
&& cd /usr/src/Python-${PYTHON_VERSION} \
&& ./configure --enable-optimizations --with-ssl && make -j 32 altinstall \
&& rm -rf /usr/src/Python-${PYTHON_VERSION}*
python3.8 --version
Python 3.8.13
Follow steps here.
git --version
git version 2.34.1
Customize the workspace:
export WORKSPACE=/local/mnt/workspace
Add environment variables to ~/.bashrc
:
echo -n "\
export CK_PYTHON=${CK_PYTHON:-$(which python3.8)}
export CK_WORKSPACE=$WORKSPACE
export CK_TOOLS=$WORKSPACE/$USER/CK-TOOLS
export CK_REPOS=$WORKSPACE/$USER/CK-REPOS
export CK_EXPERIMENT_REPO=mlperf_v2.0.$(hostname).$USER
export CK_EXPERIMENT_DIR=$WORKSPACE/$USER/CK-REPOS/mlperf_v2.0.$(hostname).$USER/experiment
export RESOURCES_DIR=/local/mnt/workspace/resources
export PATH=$HOME/.local/bin:$PATH" >> ~/.bashrc
Init it:
source ~/.bashrc
sudo mkdir -p $CK_WORKSPACE/$USER && sudo chown $USER:qaic $CK_WORKSPACE/$USER
$CK_PYTHON -m pip install --ignore-installed pip setuptools testresources ck==2.6.1 --user --upgrade
ck version
V2.6.1
Pull the ck-qaic
repository (and, recursively, its dependent repositories):
ck pull repo --url=https://github.com/krai/ck-qaic
ck add repo:$CK_EXPERIMENT_REPO --quiet
ck add $CK_EXPERIMENT_REPO:experiment:dummy --common_func
ck rm $CK_EXPERIMENT_REPO:experiment:dummy --force
sudo chgrp -R qaic $CK_EXPERIMENT_DIR
sudo chmod -R g+ws $CK_EXPERIMENT_DIR
setfacl -R -d -m group:qaic:rwx $CK_EXPERIMENT_DIR
touch $CK_EXPERIMENT_DIR/TEST && ls -Rla $CK_EXPERIMENT_DIR && rm $CK_EXPERIMENT_DIR/TEST
/local/mnt/workspace/auditor/CK-REPOS/mlperf_v2.0.dyson.auditor/experiment: total 24 drwxrwsr-x+ 3 auditor qaic 4096 Mar 31 12:35 . drwxrwxr-x. 4 auditor auditor 4096 Mar 31 12:34 .. drwxrwsr-x+ 2 auditor qaic 4096 Mar 31 12:34 .cm -rw-rw-r--+ 1 auditor qaic 0 Mar 31 12:35 TEST
/local/mnt/workspace/auditor/CK-REPOS/mlperf_v2.0.dyson.auditor/experiment/.cm: total 16 drwxrwsr-x+ 2 auditor qaic 4096 Mar 31 12:34 . drwxrwsr-x+ 3 auditor qaic 4096 Mar 31 12:35 ..
We provide all build commands for completeness. However, all images can be built using by specifying SDK-dependent commands only. SDK-independent images will be built automatically.
Build base images
To build a base OS image including Python and GCC:
$(ck find ck-qaic:docker:base)/build.base.sh
DOCKER_OS=centos7
(only CentOS 7 is currently supported).PYTHON_VER=3.8.13
.GCC_MAJOR_VER=11
.TIMEZONE=US/Central
(Austin).
docker run --rm krai/base.centos7
centos-release-7-9.2009.1.el7.centos.x86_64
To build a base image for CK packages common to all supported MLPerf Inference benchmarks:
$(ck find ck-qaic:docker:base)/build.ck.sh
DOCKER_OS=centos7
.PYTHON_VER=3.8.13
.GCC_MAJOR_VER=11
.CK_VER=2.6.1
.GROUP_ID=1500
.USER_ID=2000
.
docker run --rm krai/ck.common.centos7
V2.6.1
SDK_VER=1.6.80 SDK_DIR=/local/mnt/workspace/sdks/ $(ck find ck-qaic:docker:base)/build.qaic.sh
DOCKER_OS=centos7
.SDK_DIR=/local/mnt/workspace/sdks/
.SDK_VER=1.6.80
.PLATFORM_SDK
.APPS_SDK
.
export SDK_VER=1.6.80 && docker run --privileged --rm krai/qaic.centos7:${SDK_VER}
Status:Ready
DATASETS_DIR=/local/mnt/workspace/datasets $(ck find ck-qaic:docker:imagenet)/build.sh
Sending build context to Docker daemon 6.747GB Step 1/2 : FROM centos:7 7: Pulling from library/centos 2d473b07cdd5: Pull complete Digest: sha256:c73f515d06b0fa07bb18d8202035e739a494ce760aa73129f60f4bf2bd22b407 Status: Downloaded newer image for centos:7 ---> eeb6ee3f44bd Step 2/2 : ADD imagenet /imagenet ---> 8b50031cf317 Successfully built 8b50031cf317 Successfully tagged imagenet:latestreal 3m18.980s user 0m16.976s sys 0m11.106s
Done.
docker image ls imagenet
EPOSITORY TAG IMAGE ID CREATED SIZE imagenet latest 8b50031cf317 About a minute ago 6.91GB
CK_QAIC_CHECKOUT=v2.0 $(ck find repo:ck-qaic)/docker/build_ck.sh resnet50
docker image ls krai/*resnet50*
REPOSITORY TAG IMAGE ID CREATED SIZE krai/ck.resnet50.centos7 latest 11ee9bfb3c50 9 minutes ago 13.5GB
CK_QAIC_CHECKOUT=v2.0 $(ck find repo:ck-qaic)/docker/build.sh resnet50
docker image ls krai/*resnet50*
[auditor@dyson ck-qaic]$ docker image ls krai/*resnet50* REPOSITORY TAG IMAGE ID CREATED SIZE krai/mlperf.resnet50.full.centos7 1.6.80 a33de9c692e9 59 minutes ago 12.3GB krai/ck.resnet50.centos7 latest 6a9471f3a2ed 2 hours ago 13.5GB
CONTAINER_ID=$(ck run cmdgen:benchmark.image-classification.qaic-loadgen --docker=container_only --out=none --sdk=1.6.80 --model_name=resnet50 --experiment_dir)
ck run cmdgen:benchmark.image-classification.qaic-loadgen --sut=r282_z93_q1 --sdk=1.6.80 --model=resnet50 --mode=accuracy --scenario=offline --container=$CONTAINER_ID
CK_QAIC_CHECKOUT=v2.0 $(ck find repo:ck-qaic)/docker/build_ck.sh ssd-resnet34
docker image ls krai/*ssd-resnet34*
REPOSITORY TAG IMAGE ID CREATED SIZE krai/ck.ssd-resnet34.centos7 latest bebaeb96fa93 5 minutes ago 27.5GB
CK_QAIC_CHECKOUT=v2.0 $(ck find repo:ck-qaic)/docker/build.sh ssd-resnet34
docker image ls krai/*ssd-resnet34*
REPOSITORY TAG IMAGE ID CREATED SIZE krai/mlperf.ssd-resnet34.centos7 1.6.80 4e31315c9cd2 2 minutes ago 25.2GB krai/ck.ssd-resnet34.centos7 latest bebaeb96fa93 26 minutes ago 27.5GB
CONTAINER_ID=$(ck run cmdgen:benchmark.object-detection-large.qaic-loadgen --docker=container_only --out=none --sdk=1.6.80 --model_name=ssd-resnet34 --experiment_dir)
ck run cmdgen:benchmark.object-detection-large.qaic-loadgen --sut=r282_z93_q1 --sdk=1.6.80 --model=ssd_resnet34 --mode=accuracy --scenario=offline --container=$CONTAINER_ID
CK_QAIC_CHECKOUT=v2.0 $(ck find repo:ck-qaic)/docker/build_ck.sh ssd-mobilenet
docker image ls krai/*ssd-mobilenet*
REPOSITORY TAG IMAGE ID CREATED SIZE krai/ck.ssd-mobilenet.centos7 latest fdd48e3378de 2 hours ago 8.9GB
CK_QAIC_CHECKOUT=v2.0 $(ck find repo:ck-qaic)/docker/build.sh ssd-mobilenet
docker image ls krai/*ssd-mobilenet*
REPOSITORY TAG IMAGE ID CREATED SIZE krai/mlperf.ssd-mobilenet.centos7 1.6.80 9db636a770c1 6 minutes ago 6.35GB krai/ck.ssd-mobilenet.centos7 latest fdd48e3378de 2 hours ago 8.9GB
CONTAINER_ID=$(ck run cmdgen:benchmark.object-detection-small.qaic-loadgen --docker=container_only --out=none --sdk=1.6.80 --model_name=ssd-mobilenet --experiment_dir)
ck run cmdgen:benchmark.object-detection-small.qaic-loadgen --sut=r282_z93_q1 --sdk=1.6.80 --model=ssd_mobilenet --mode=accuracy --scenario=offline --container=$CONTAINER_ID
CK_QAIC_CHECKOUT=v2.0 $(ck find repo:ck-qaic)/docker/build_ck.sh bert
CK_QAIC_CHECKOUT=v2.0 CK_QAIC_PCV=9983 SDK_DIR=/local/mnt/workspace/mlcommons/sdks $(ck find repo:ck-qaic)/docker/build.sh bert
docker image ls krai/*bert*
REPOSITORY TAG IMAGE ID CREATED SIZE krai/mlperf.bert.centos7 1.6.80 550bbbcf9f91 10 minutes ago 7GB krai/ck.bert.centos7 latest a89ab03b895b 45 minutes ago 13.1GB
CONTAINER_ID=$(ck run cmdgen:benchmark.packed-bert.qaic-loadgen --docker=container_only --out=none --sdk=1.6.80 --model_name=bert --experiment_dir)
ck run cmdgen:benchmark.packed-bert.qaic-loadgen --sut=r282_z93_q1 --sdk=1.6.80 --model=bert-99 --mode=accuracy --scenario=offline --container=$CONTAINER_ID
Benchmark (without compliance tests: DIVISION=open
)
SUT=r282_z93_q2 SDK_VER=1.6.80 DIVISION=open DOCKER=yes ./run_edge.sh
ck list mlperf_v2.0.dyson.auditor:experiment:*
To remove any of the above experiments
ck rm experiment:<experiment_folder_name>
export SUBMISSIONS_DIR=/local/mnt/workspace/mlperf-inference-submissions
mkdir -p $SUBMISSIONS_DIR
mkdir -p $SUBMISSIONS_DIR/scripts
git clone [email protected]:krai/mlperf-inference $SUBMISSIONS_DIR/scripts/krai-mlperf-inference
ck detect soft:compiler.python --full_path=$(which python3.8)
ck install package --tags=mlperf,inference,r2.0
ck install package --tags=dataset,coco,val,2017
ck install package --tags=dataset,imagenet,aux,from.berkeley
ck install package --tags=dataset,tokenization,vocab
ck install package --tags=dataset,squad,original
ck install package --tags=dataset,squad,tokenized,pickle
ck install package --tags=lib,python-package,absl
ck install package --tags=lib,python-package,transformers --force_version=2.4.0
export SUBMISSIONS_DIR=/local/mnt/workspace/mlperf-inference-submissions
export RESOURCES_DIR=$SUBMISSIONS_DIR/resources
mkdir -p $RESOURCES_DIR
cp -r $(ck locate env --tags=mlperf,inference,source,r2.0) $RESOURCES_DIR/
cp -r $(ck locate env --tags=coco,val) $RESOURCES_DIR/
cp $(ck locate env --tags=aux)/val.txt $RESOURCES_DIR/
cp $(ck locate env --tags=vocab,tokenization)/vocab.txt $RESOURCES_DIR/
cp $(ck locate env --tags=squad,original)/dev-v1.1.json $RESOURCES_DIR/
cp $(ck locate env --tags=squad,tokenized,pickle)/bert_tokenized_squad_v1_1.pickle $RESOURCES_DIR/
python3.8 -m pip install pandas tabulate pycocotools --user
sudo yum install mariadb-devel
python3.8 -m pip install sqlalchemy mysqlclient --user
Run from the dump-repo-to-submission
directory:
cd /local/mnt/workspace/mlperf-inference-submissions/scripts/krai-mlperf-inference/dump-repo-to-submission
SUT=r282_z93_q2 SDK_VER=1.6.80 SUBMITTER=GIGABYTE PRECHECK=yes ./run.sh
(or from outside by providing the absolute path to the script.)
To use resources only from the user CK directories:
SUT=r282_z93_q2 SDK_VER=1.6.80 SUBMITTER=GIGABYTE PRECHECK=yes RESOURCES_DIR=no ./run.sh
To run on a custom experiment repository CK_REPO
using a specified resources directory RESOURCES_DIR
:
SUT=r282_z93_q2 SDK_VER=1.6.80 SUBMITTER=GIGABYTE PRECHECK=yes CK_REPO=mlperf_v2.0-closed-r282_z93_q8-qaic-v1.6.80 RESOURCES_DIR=/local/mnt/workspace/mlperf-inference-submissions/resources ./run.sh
To run a quick check:
SUT=r282_z93_q2 SDK_VER=1.6.80 PRECHECK=yes SKIP_CHECK=yes RESOURCES_DIR=dummy_dir SUBMITTER=GIGABYTE ./run.sh
Benchmark (with compliance tests)
SUT=r282_z93_q2 SDK_VER=1.6.80 DIVISION=closed DOCKER=yes ./run_edge.sh