MLPerf Inference 0.5 Self-Certification Checklist

Name of Certifying Engineer(s):

Email of Certifying Engineer(s):

Name of System(s) Under Test:

Division (check one):

Open
Closed

Category (check one):

Available
Preview
Research, Development, and Internal (RDI)

Benchmark (check one):

MobileNet
SSD-MobileNet
ResNet
SSD-1200
NMT
Other, please specify:

Please fill in the following tables adding lines as necessary: 97%-tile latency is required for NMT only. 99%-tile is required for all other models.

Single Stream Results Table

SUT Name	Benchmark	Query Count	Accuracy

Multi-Stream Results Table

SUT Name	Benchmark	Query Count	Accuracy	97%-tile Latency	99%-tile Latency

Server Results Table

SUT Name	Benchmark	Query Count	Accuracy	97%-tile Latency	99%-tile Latency

Offline Results Table

SUT Name	Benchmark	Sample Count	Accuracy

Scenario (check all that apply):

Single-Stream
Multi-Stream
Server
Offline

For each SUT, does the submission meet the latency target for each combination of benchmark and scenario? (check all that apply)

Yes (Single-Stream and Offline no requirements)
Yes (MobileNet x Multi-Stream 50 ms @ 99%)
Yes (MobileNet x Server 10 ms @ 99%)
Yes (SSD-MobileNet x Multi-Stream 50 ms @ 99%)
Yes (SSD-MobileNet x Server 10 ms @ 99%)
Yes (ResNet x Multi-Stream 50 ms @ 99%)
Yes (ResNet x Server 15 ms @ 99%)
Yes (SSD-1200 x Multi-Stream 66 ms @ 99%).
Yes (SSD-1200 x Server 100 ms @ 99%)
Yes (NMT x Multi-Stream 100 ms @ 97%)
Yes (NMT x Server 250 ms @ 97%)
No

For each SUT, is the appropriate minimum number of queries or samples met, depending on the Scenario x Benchmark? (check all that apply)

Yes (Single-Stream 1,024 queries)
Yes (Offline 24,576 samples)
Yes (NMT Server and Multi-Stream 90,112 queries)
Yes (Image Models Server and Multi-Stream 270,336 queries)
No

For each SUT and scenario, is the benchmark accuracy target met? (check all that apply)

Yes (MobileNet 71.68% x 98%)
Yes (SSD-MobileNet 0.22 mAP x 99%)
Yes (ResNet 76.46% x 99%)
Yes (SSD-1200 0.20 mAP x 99%)
Yes (NMT 23.9 BLEU x 99%)
No

For each SUT and scenario, did the submission run on the whole validation set in accuracy mode? (check one)

Yes
No

How many samples are loaded into the QSL in performance mode?

For each SUT and scenario, does the number of loaded samples in the QSL in performance mode meet the minimum requirement? (check all that apply)

Yes (ResNet and MobileNet 1,024 samples)
Yes (SSD-MobileNet 256 samples)
Yes (SSD-1200 64 samples)
Yes (NMT 3,903,900 samples)
No

For each SUT and scenario, is the experimental duration greater than or equal to 60 seconds? (check one)

Yes
No

Does the submission use LoadGen? (check one)

Yes
No

Is your loadgen commit from one of these allowed commit hashes?

61220457dec221ed1984c62bd9d382698bd71bc6
5684c11e3987b614aae830390fa0e92f56b7e800
55c0ea4e772634107f3e67a6d0da61e6a2ca390d
d31c18fbd9854a4f1c489ca1bc4cd818e48f2bc5
1d0e06e54a7d763cf228bdfd8b1e987976e4222f
Other, please specify:

Do you have any additional change to Loadgen? (check one)

Yes, please specify:
No

Does the submission run the same code in accuracy and performance modes? (check one)

Yes
No

Where is the LoadGen trace stored? (check one)

Host DRAM
Other, please specify:

For the submitted result, what is the QSL random number generator seed?

0x2b7e151628aed2a6ULL (3133965575612453542)
Other, please specify:

For the submitted results, what is the sample index random number generator seed?

0x093c467e37db0c7aULL (665484352860916858)
Other, please specify:

For the submitted results, what is the schedule random number generator seed?

0x3243f6a8885a308dULL (3622009729038561421)
Other, please specify:

For each SUT and scenario, is the submission run the correct number of times for the relevant scenario? (check one)

Yes (Accuracy 1x Performance 1x Single-Stream, Multi-Stream, Offline)
Yes (Accuracy 1x Performance 5x Server)
No

Are the weights calibrated using data outside of the calibration set? (check one)

Yes
No

What untimed pre-processing does the submission use? (check all that apply)

Resize
Reorder channels or transpose
Pad
A single crop
Mean subtraction and normalization
Convert to whitelisted format
No pre-processing
Other, please specify:

What numerics does the submission use? (check all that apply)

INT4
INT8
INT16
UINT8
UINT16
FP11
FP16
BF16
FP32
Other, please specify:

Which of the following techniques does the submission use? (check all that apply)

Wholesale weight replacement
Weight supplements
Discarding non-zero weight elements
Pruning
Caching queries
Caching responses
Caching intermediate computations
Modifying weights during the timed portion of an inference run
Weight quantization algorithms that are similar in size to the non-zero weights they produce
Hard coding the total number of queries
Techniques that boost performance for fixed length experiments but are inapplicable to long-running services except in the offline scenario
Using knowledge of the LoadGen implementation to predict upcoming lulls or spikes in the server scenario
Treating beams in a beam search differently. For example, employing different precision for different beams
Changing the number of beams per beam search relative to the reference
Incorporating explicit statistical information about the performance or accuracy sets
Techniques that take advantage of upsampled images.
Techniques that only improve performance when there are identical samples in a query.
None of the above

Is the submission congruent with all relevant MLPerf rules?

Yes
No

For each SUT, does the submission accurately reflect the real-world performance of the SUT?

Yes
No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

submission_checklist.md

submission_checklist.md

MLPerf Inference 0.5 Self-Certification Checklist

Single Stream Results Table

Multi-Stream Results Table

Server Results Table

Offline Results Table

Files

submission_checklist.md

Latest commit

History

submission_checklist.md

File metadata and controls

MLPerf Inference 0.5 Self-Certification Checklist

Single Stream Results Table

Multi-Stream Results Table

Server Results Table

Offline Results Table