Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automation and reproducibility for MLPerf Inference v3.1 #1052

Closed
arjunsuresh opened this issue Jan 16, 2024 · 6 comments
Closed

Automation and reproducibility for MLPerf Inference v3.1 #1052

arjunsuresh opened this issue Jan 16, 2024 · 6 comments
Assignees

Comments

@arjunsuresh
Copy link
Contributor

arjunsuresh commented Jan 16, 2024

The MLCommons taskforce on automation and reproducibility is helping the community, vendors and submitters check if it is possible to re-run MLPerf inference v3.1 submissions, fix encountered issues and add their implementations to the MLCommons CM automation to run all MLPerf benchmark implementations in a unified way.

Note that CM is a collaborative project to run all MLPerf inference benchmarks on any platform with any software/hardware stack using a unified interface. The MLCommons CM interface and automations are being developed based on the feedback from MLPerf users and submitters - if you encounter some issues or have suggestions and feature requests, please report them either via GitHub issues, via our Discord channel or by providing a patch to CM automations. Thank you and looking forward to collaborating with you!

Model / Implementation Reference
Intel,Nvidia,AMD,ARM
Nvidia CUDA Intel QAIC DeepSparse
Intel,ARM,AMD
Google TPU
ResNet50 2
✅ - via CM
RetinaNet 2
✅ - via CM
Bert
3d-Unet
RNNT
DLRMv2
GPT-J TBD
Stable Diffusion
Llama2 Added to CM - looking for volunteers to test it

1- original docker container fails because of incompatibility with the latest PIP packages: see GitHub issue. We collaborate with Intel to integrate their patch with the CM automation and re-run their submissions - it's mostly done.
2❌- not possible to rerun and reproduce performance numbers due to missing configuration files: see GitHub issue. After discussing this issue with submitters, we helped them generate missing configuration files using MLCommons CM automation for QAIC and match QAIC performance numbers from v3.1 submission. It should be possible to use CM for QAIC MLPerf v4.0 inference submissions.

MLCommons CM interface

You should be able to run MLPerf inference benchmarks via unified CM interface and portable workflow that can run natively or inside automatically generated Docker container:

pip install cmind
cm pull repo mlcommons@ck
cmr "run common mlperf inference" --implementation=nvidia --model=bert-99

Prepare official submission for Edge category:

cmr "run common mlperf inference _submission _full" --implementation=nvidia --model=bert-99

Prepare official submission for DataCenter category:

cmr "run common mlperf inference _submission _full" --implementation=nvidia \
--model=bert-99 --category=datacenter --division=closed
@gfursin gfursin changed the title CM Reproducibility for MLPerf Inference Reproducibility and Automation for MLPerf Inference v3.1 and v4.0 Jan 23, 2024
@gfursin gfursin changed the title Reproducibility and Automation for MLPerf Inference v3.1 and v4.0 Automation and reproducibility for MLPerf Inference v3.1 and v4.0 Jan 24, 2024
@gfursin
Copy link
Contributor

gfursin commented Jan 27, 2024

We got a feedback from submitters to create a GUI that can generate CM commands. I opened a ticket: #1070

@gfursin
Copy link
Contributor

gfursin commented Jan 30, 2024

[20240130] We had lots of great feedback and improved both generic CM automation recipes and CM workflows for MLPerf inference:

@gfursin
Copy link
Contributor

gfursin commented Feb 1, 2024

We started discussing a proposal for MLPerf reproducibility badges similar to ACM/IEEE/NeurIPS conferences: #1080 - feedback is welcome!

@gfursin
Copy link
Contributor

gfursin commented Feb 20, 2024

Following the feedback from the MLPerf submitters, we have developed a prototype of a GUI to generate a command line to run MLPerf inference benchmarks for all main implementations (reference, Intel, Nvidia, Qualcomm, MIL and DeepSparse) and automate submissions. You can check it here. The long-term goal is to aggregate and encode all MLPerf submission rules and notes for all models, categories and divisions in this GUI.

We have also developed a prototype of a reproducibility infrastructure to keep track of successful MLPerf inference benchmark configurations across different MLPerf versions, hardware, implementations, models and backends based on the ACM/IEEE/cTuning reproducibility methodology and badging. You can see the last results here - we will continue adding more tests based on your suggestions including GPT-J, LLAMA2 and Stable Diffusion.

Our goal is to test as many v4.0 submissions as possible and add them to the above GUI to make it easier for the community to rerun experiments after the publication date. If some configurations are not working, we plan to help submitters fix issues.

@gfursin
Copy link
Contributor

gfursin commented Feb 29, 2024

We improved CM automation for Intel, Nvidia and Qualcomm and added to GUI: https://access.cknowledge.org/playground/?action=howtorun&bench_uid=39877bb63fb54725 . We can re-run most of them now.

@gfursin
Copy link
Contributor

gfursin commented Mar 13, 2024

We now have relatively stable common CM interface to rerun above submissions and reproduce key results. I close this ticket - we will open a similar ticket for inference v4.0 after publication. Huge thanks to colleagues from Intel, Qualcomm and Nvidia for their help and suggestions!

@gfursin gfursin closed this as completed Mar 13, 2024
@gfursin gfursin changed the title Automation and reproducibility for MLPerf Inference v3.1 and v4.0 Automation and reproducibility for MLPerf Inference v3.1 Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants