Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the open dataset requirement #285

Merged
merged 1 commit into from
Nov 28, 2023

Conversation

psyhtest
Copy link
Contributor

No description provided.

@psyhtest psyhtest requested a review from a team as a code owner November 21, 2023 16:08
Copy link

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@psyhtest
Copy link
Contributor Author

psyhtest commented Nov 21, 2023

Justifying the removed text

From v3.0, if a submitter provides any results with any models trained on a pre-approved dataset,
the submitter must also provide at least one result with the corresponding Closed model trained
(or finetuned) on the same pre-approved dataset, and instructions to reproduce the training (or finetuning) process.

I recall we introduced this specifically for RetinaNet just before introducing it in v2.1. At the time, the RetinaNet dataset, an MLPerf subset of OpenImages, was only used for benchmarking one and only one model, namely the MLPerf variant of RetinaNet. Therefore, we would miss out on objectively benchmarking other research Object Detection models, typically trained and validated on the COCO dataset. The idea was that a potential submitter would finetune RetinaNet on COCO too and thus provide a useful baseline figure for any comparisons on the alternative dataset.

We at KRAI actually did this for v2.1, measuring mAP=35.293% and publishing the finetuned model. This accuracy is lower than that of the reference model on OpenImages (mAP=37.55%), but much higher than, say, that of the deprecated SSD-ResNet34 model (mAP=20.00%). So a submitter showcasing their highly optimized SSD-ResNet34 implementation could legitimately claim that it is faster than RetinaNet, albeit less accurate.

This is not fool-proof, however. A submitter could spend minimal effort on finetuning (or not at all), presenting, for example, that RetinaNet achieves only mAP=10% on the COCO dataset. Then they could misleadingly claim that their optimized SSD-ResNet34 implementation is both faster than RetinaNet and more accurate.

Justifying the added text

When seeking such pre-approval, it is recommended that a potential submitter convincingly demonstrates the accuracy of the corresponding Closed model on the same validation dataset, which may involve retraining or finetuning the Closed model if required.

This is intended to avoid the above situation. At least, such a submitter would face scrutiny from the WG in the pre-approval stage :). They may get away with handwaving it through though :).

@nv-ananjappa
Copy link
Contributor

@psyhtest This is perfect. Covers everything we wanted to change. LGTM.

@mrmhodak mrmhodak merged commit 42db091 into mlcommons:master Nov 28, 2023
1 check passed
@github-actions github-actions bot locked and limited conversation to collaborators Nov 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants