Skip to content

Commit 07072be

Browse files
committed
Results from GH action on NVIDIA_RTX4090x1
1 parent 34cee9e commit 07072be

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+1315
-1315
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11

2-
hash=182b35bd45a00bbb889a0adc85afda3a54cb7773f18afbcd72008faf01150c1e
2+
hash=f77b73c361844de012c0dce885370820d5ccae64260df910ac9384b532b9f28b

closed/MLCommons/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/accuracy/baseline_accuracy.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
{"exact_match": 26.18732261116367, "f1": 28.449692786466613}
1+
{"exact_match": 25.799432355723745, "f1": 28.172649034606692}
22
Reading examples...
33
No cached features at 'eval_features.pickle'... converting from examples...
44
Creating tokenizer...

closed/MLCommons/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/accuracy/compliance_accuracy.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
{"exact_match": 26.18732261116367, "f1": 28.449692786466613}
1+
{"exact_match": 25.789971617786186, "f1": 28.170756887019184}
22
Reading examples...
33
Loading cached features from 'eval_features.pickle'...
44
Loading LoadGen logs...

closed/MLCommons/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/accuracy/mlperf_log_accuracy.json

+13-13
Large diffs are not rendered by default.

closed/MLCommons/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/performance/run_1/mlperf_log_detail.txt

+88-88
Large diffs are not rendered by default.

closed/MLCommons/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/performance/run_1/mlperf_log_summary.txt

+12-12
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ MLPerf Results Summary
44
SUT name : BERT SERVER
55
Scenario : Offline
66
Mode : PerformanceOnly
7-
Samples per second: 1672.43
7+
Samples per second: 4120.16
88
Result is : VALID
99
Min duration satisfied : Yes
1010
Min queries satisfied : Yes
@@ -13,21 +13,21 @@ Result is : VALID
1313
================================================
1414
Additional Stats
1515
================================================
16-
Min latency (ns) : 822817546
17-
Max latency (ns) : 666130717761
18-
Mean latency (ns) : 402968489190
19-
50.00 percentile latency (ns) : 429025220682
20-
90.00 percentile latency (ns) : 635209685133
21-
95.00 percentile latency (ns) : 653646093200
22-
97.00 percentile latency (ns) : 659606890333
23-
99.00 percentile latency (ns) : 664432670132
24-
99.90 percentile latency (ns) : 666000837204
16+
Min latency (ns) : 708599146
17+
Max latency (ns) : 666754266053
18+
Mean latency (ns) : 404151575983
19+
50.00 percentile latency (ns) : 430375053769
20+
90.00 percentile latency (ns) : 635778038883
21+
95.00 percentile latency (ns) : 654116521706
22+
97.00 percentile latency (ns) : 660124427593
23+
99.00 percentile latency (ns) : 664937581583
24+
99.90 percentile latency (ns) : 666620759680
2525

2626
================================================
2727
Test Parameters Used
2828
================================================
29-
samples_per_query : 1114055
30-
target_qps : 1687.96
29+
samples_per_query : 2747131
30+
target_qps : 4162.32
3131
target_latency (ns): 0
3232
max_async_queries : 1
3333
min_duration (ms): 600000

closed/MLCommons/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/verify_accuracy.txt

+3-3
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ Reading performance mode results...
44
num_acc_log_entries = 10833
55
num_acc_log_duplicate_keys = 0
66
num_acc_log_data_mismatch = 0
7-
num_perf_log_entries = 4110
8-
num_perf_log_qsl_idx_match = 4110
9-
num_perf_log_data_mismatch = 48
7+
num_perf_log_entries = 4096
8+
num_perf_log_qsl_idx_match = 4096
9+
num_perf_log_data_mismatch = 25
1010
num_missing_qsl_idxs = 0
1111
TEST FAIL
1212

Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
Verifying performance.
2-
reference score = 1671.25
3-
test score = 1672.43
2+
reference score = 4121.11
3+
test score = 4120.16
44
TEST PASS
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11

2-
hash=d2c37222acfeea679dec6f7bd86f5c9b63a0bd7c958924cad8514cab68deb5c6
2+
hash=382cfb5879ab79c72dee498e0548e998fe96136cffa8fb5d96685e3ea0a5b0d8

closed/MLCommons/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/bert-99/singlestream/TEST01/accuracy/mlperf_log_accuracy.json

+6-6
Large diffs are not rendered by default.

closed/MLCommons/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/bert-99/singlestream/TEST01/performance/run_1/mlperf_log_detail.txt

+92-92
Large diffs are not rendered by default.

closed/MLCommons/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/bert-99/singlestream/TEST01/performance/run_1/mlperf_log_summary.txt

+17-17
Original file line numberDiff line numberDiff line change
@@ -4,38 +4,38 @@ MLPerf Results Summary
44
SUT name : BERT SERVER
55
Scenario : SingleStream
66
Mode : PerformanceOnly
7-
90th percentile latency (ns) : 2165646
7+
90th percentile latency (ns) : 1009856
88
Result is : VALID
99
Min duration satisfied : Yes
1010
Min queries satisfied : Yes
1111
Early stopping satisfied: Yes
1212
Early Stopping Result:
13-
* Processed at least 64 queries (391306).
14-
* Would discard 38693 highest latency queries.
15-
* Early stopping 90th percentile estimate: 2166096
16-
* Early stopping 99th percentile estimate: 2625785
13+
* Processed at least 64 queries (645364).
14+
* Would discard 63974 highest latency queries.
15+
* Early stopping 90th percentile estimate: 1010076
16+
* Early stopping 99th percentile estimate: 1183040
1717

1818
================================================
1919
Additional Stats
2020
================================================
21-
QPS w/ loadgen overhead : 652.17
22-
QPS w/o loadgen overhead : 656.82
21+
QPS w/ loadgen overhead : 1075.60
22+
QPS w/o loadgen overhead : 1080.65
2323

24-
Min latency (ns) : 1164215
25-
Max latency (ns) : 9621612
26-
Mean latency (ns) : 1522486
27-
50.00 percentile latency (ns) : 1435792
28-
90.00 percentile latency (ns) : 2165646
29-
95.00 percentile latency (ns) : 2359727
30-
97.00 percentile latency (ns) : 2604855
31-
99.00 percentile latency (ns) : 2625684
32-
99.90 percentile latency (ns) : 2667282
24+
Min latency (ns) : 853634
25+
Max latency (ns) : 1457922
26+
Mean latency (ns) : 925365
27+
50.00 percentile latency (ns) : 903487
28+
90.00 percentile latency (ns) : 1009856
29+
95.00 percentile latency (ns) : 1075739
30+
97.00 percentile latency (ns) : 1168011
31+
99.00 percentile latency (ns) : 1182999
32+
99.90 percentile latency (ns) : 1187137
3333

3434
================================================
3535
Test Parameters Used
3636
================================================
3737
samples_per_query : 1
38-
target_qps : 1643.87
38+
target_qps : 2702.33
3939
target_latency (ns): 0
4040
max_async_queries : 1
4141
min_duration (ms): 600000

closed/MLCommons/compliance/RTX4090x1-nvidia-gpu-TensorRT-default_config/bert-99/singlestream/TEST01/verify_accuracy.txt

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ Reading performance mode results...
44
num_acc_log_entries = 10833
55
num_acc_log_duplicate_keys = 0
66
num_acc_log_data_mismatch = 0
7-
num_perf_log_entries = 1620
8-
num_perf_log_qsl_idx_match = 1620
7+
num_perf_log_entries = 1663
8+
num_perf_log_qsl_idx_match = 1663
99
num_perf_log_data_mismatch = 0
1010
num_missing_qsl_idxs = 0
1111
TEST PASS
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
Verifying performance.
2-
reference score = 2165616
3-
test score = 2166096
2+
reference score = 1009657
3+
test score = 1010076
44
TEST PASS
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
| Model | Scenario | Accuracy | Throughput | Latency (in ms) | Power Efficiency (in samples/J) | TEST01 |
2-
|--------------|--------------|------------|--------------|-------------------|-----------------------------------|----------|
3-
| 3d-unet-99.9 | offline | 0.86236 | 4.157 | - | | passed |
4-
| 3d-unet-99.9 | singlestream | 0.86236 | 2.309 | 433.121 | | passed |
1+
| Model | Scenario | Accuracy | Throughput | Latency (in ms) | Power Efficiency (in samples/J) | TEST01 |
2+
|---------|--------------|------------|--------------|-------------------|-----------------------------------|----------|
3+
| bert-99 | offline | 90.1528 | 4121.11 | - | | passed |
4+
| bert-99 | singlestream | 90.2668 | 990.099 | 1.01 | | passed |

closed/MLCommons/measurements/RTX4090x1-nvidia-gpu-TensorRT-default_config/bert-99/offline/README.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ pip install -U mlcflow
1717

1818
mlc rm cache -f
1919

20-
mlc pull repo mlcommons@mlperf-automations --checkout=03d9201c1c9305c7c3eaa0262984af76c7f2287f
20+
mlc pull repo mlcommons@mlperf-automations --checkout=6a917925e946fcf6a1511578ba101067d4a88532
2121

2222

2323
```
@@ -35,10 +35,10 @@ mlc rm cache -f
3535

3636
Platform: RTX4090x1-nvidia-gpu-TensorRT-default_config
3737

38-
Model Precision: fp16
38+
Model Precision: int8
3939

4040
### Accuracy Results
41-
`F1`: `90.88324`, Required accuracy for closed division `>= 90.78313`
41+
`F1`: `90.15279`, Required accuracy for closed division `>= 89.96526`
4242

4343
### Performance Results
44-
`Samples per second`: `1671.25`
44+
`Samples per second`: `4121.11`

closed/MLCommons/measurements/RTX4090x1-nvidia-gpu-TensorRT-default_config/bert-99/offline/RTX4090x1-nvidia-gpu-TensorRT-default_config.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,6 @@
22
"starting_weights_filename": "https://armi.in/files/bert_large_v1_1_fake_quant.onnx",
33
"retraining": "no",
44
"input_data_types": "int32",
5-
"weight_data_types": "fp16",
5+
"weight_data_types": "int8",
66
"weight_transformations": "quantization, affine fusion"
77
}

0 commit comments

Comments
 (0)