Automation and reproducibility for MLPerf Inference v3.1 #1052

arjunsuresh · 2024-01-16T10:54:03Z

The MLCommons taskforce on automation and reproducibility is helping the community, vendors and submitters check if it is possible to re-run MLPerf inference v3.1 submissions, fix encountered issues and add their implementations to the MLCommons CM automation to run all MLPerf benchmark implementations in a unified way.

Note that CM is a collaborative project to run all MLPerf inference benchmarks on any platform with any software/hardware stack using a unified interface. The MLCommons CM interface and automations are being developed based on the feedback from MLPerf users and submitters - if you encounter some issues or have suggestions and feature requests, please report them either via GitHub issues, via our Discord channel or by providing a patch to CM automations. Thank you and looking forward to collaborating with you!

Model / Implementation	Reference Intel,Nvidia,AMD,ARM	Nvidia CUDA	Intel	QAIC	DeepSparse Intel,ARM,AMD	Google TPU
ResNet50	✅	✅	✅	❌² ✅ - via CM	✅
RetinaNet	✅	✅		❌² ✅ - via CM
Bert	✅	✅	✅	✅	✅
3d-Unet	✅	✅
RNNT	✅	✅
DLRMv2		✅
GPT-J	✅	✅	✅			TBD
Stable Diffusion	✅
Llama2	Added to CM - looking for volunteers to test it

¹- original docker container fails because of incompatibility with the latest PIP packages: see GitHub issue. We collaborate with Intel to integrate their patch with the CM automation and re-run their submissions - it's mostly done.
²❌- not possible to rerun and reproduce performance numbers due to missing configuration files: see GitHub issue. After discussing this issue with submitters, we helped them generate missing configuration files using MLCommons CM automation for QAIC and match QAIC performance numbers from v3.1 submission. It should be possible to use CM for QAIC MLPerf v4.0 inference submissions.

MLCommons CM interface

You should be able to run MLPerf inference benchmarks via unified CM interface and portable workflow that can run natively or inside automatically generated Docker container:

pip install cmind
cm pull repo mlcommons@ck
cmr "run common mlperf inference" --implementation=nvidia --model=bert-99

Prepare official submission for Edge category:

cmr "run common mlperf inference _submission _full" --implementation=nvidia --model=bert-99

Prepare official submission for DataCenter category:

cmr "run common mlperf inference _submission _full" --implementation=nvidia \
--model=bert-99 --category=datacenter --division=closed

gfursin · 2024-01-27T11:27:38Z

We got a feedback from submitters to create a GUI that can generate CM commands. I opened a ticket: #1070

gfursin · 2024-01-30T10:24:54Z

[20240130] We had lots of great feedback and improved both generic CM automation recipes and CM workflows for MLPerf inference:

many pending clean ups and improvements based on user feedback #1078
cleaning up docs and improving MLPerf inference benchmaks #1079
We are preparing CM v1.6.1 for the release ...

gfursin · 2024-02-01T11:00:50Z

We started discussing a proposal for MLPerf reproducibility badges similar to ACM/IEEE/NeurIPS conferences: #1080 - feedback is welcome!

gfursin · 2024-02-20T16:51:50Z

Following the feedback from the MLPerf submitters, we have developed a prototype of a GUI to generate a command line to run MLPerf inference benchmarks for all main implementations (reference, Intel, Nvidia, Qualcomm, MIL and DeepSparse) and automate submissions. You can check it here. The long-term goal is to aggregate and encode all MLPerf submission rules and notes for all models, categories and divisions in this GUI.

We have also developed a prototype of a reproducibility infrastructure to keep track of successful MLPerf inference benchmark configurations across different MLPerf versions, hardware, implementations, models and backends based on the ACM/IEEE/cTuning reproducibility methodology and badging. You can see the last results here - we will continue adding more tests based on your suggestions including GPT-J, LLAMA2 and Stable Diffusion.

Our goal is to test as many v4.0 submissions as possible and add them to the above GUI to make it easier for the community to rerun experiments after the publication date. If some configurations are not working, we plan to help submitters fix issues.

gfursin · 2024-02-29T19:52:39Z

We improved CM automation for Intel, Nvidia and Qualcomm and added to GUI: https://access.cknowledge.org/playground/?action=howtorun&bench_uid=39877bb63fb54725 . We can re-run most of them now.

gfursin · 2024-03-13T09:34:52Z

We now have relatively stable common CM interface to rerun above submissions and reproduce key results. I close this ticket - we will open a similar ticket for inference v4.0 after publication. Huge thanks to colleagues from Intel, Qualcomm and Nvidia for their help and suggestions!

gfursin assigned arjunsuresh and gfursin Jan 16, 2024

gfursin mentioned this issue Jan 16, 2024

[automation and reproducibility taskforce] progress report for 20240116 mlcommons/inference#1559

Open

gfursin changed the title ~~CM Reproducibility for MLPerf Inference~~ Reproducibility and Automation for MLPerf Inference v3.1 and v4.0 Jan 23, 2024

gfursin changed the title ~~Reproducibility and Automation for MLPerf Inference v3.1 and v4.0~~ Automation and reproducibility for MLPerf Inference v3.1 and v4.0 Jan 24, 2024

gfursin mentioned this issue Jan 27, 2024

MLPerf inference GUI to generate CM commands for different implementations #1070

Closed

gfursin closed this as completed Mar 13, 2024

gfursin changed the title ~~Automation and reproducibility for MLPerf Inference v3.1 and v4.0~~ Automation and reproducibility for MLPerf Inference v3.1 Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automation and reproducibility for MLPerf Inference v3.1 #1052

Automation and reproducibility for MLPerf Inference v3.1 #1052

arjunsuresh commented Jan 16, 2024 •

edited by gfursin

Loading

gfursin commented Jan 27, 2024

gfursin commented Jan 30, 2024

gfursin commented Feb 1, 2024

gfursin commented Feb 20, 2024

gfursin commented Feb 29, 2024

gfursin commented Mar 13, 2024

Automation and reproducibility for MLPerf Inference v3.1 #1052

Automation and reproducibility for MLPerf Inference v3.1 #1052

Comments

arjunsuresh commented Jan 16, 2024 • edited by gfursin Loading

MLCommons CM interface

gfursin commented Jan 27, 2024

gfursin commented Jan 30, 2024

gfursin commented Feb 1, 2024

gfursin commented Feb 20, 2024

gfursin commented Feb 29, 2024

gfursin commented Mar 13, 2024

arjunsuresh commented Jan 16, 2024 •

edited by gfursin

Loading