Skip to content

Commit 1d8e7e6

Browse files
committed
Results from GH action on NVIDIA_RTX4090x2
1 parent ff4dcb3 commit 1d8e7e6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+1397
-1400
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11

2-
hash=35db7a03c67c29becaa3263e5d287b1031864f223d313450b22fb18cf48a84a8
2+
hash=289414dd9663892ffd8b764efe21ff437be503c40aa575ec1cd79dc4d167cbed

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/accuracy/baseline_accuracy.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
{"exact_match": 25.960264900662253, "f1": 28.345945349642122}
1+
{"exact_match": 25.808893093661304, "f1": 28.044760862343207}
22
Reading examples...
33
No cached features at 'eval_features.pickle'... converting from examples...
44
Creating tokenizer...

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/accuracy/compliance_accuracy.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
{"exact_match": 25.960264900662253, "f1": 28.345945349642122}
1+
{"exact_match": 25.799432355723745, "f1": 28.042868714755695}
22
Reading examples...
33
Loading cached features from 'eval_features.pickle'...
44
Loading LoadGen logs...

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/accuracy/mlperf_log_accuracy.json

+13-13
Large diffs are not rendered by default.

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/performance/run_1/mlperf_log_detail.txt

+88-88
Large diffs are not rendered by default.

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/performance/run_1/mlperf_log_summary.txt

+12-12
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ MLPerf Results Summary
44
SUT name : BERT SERVER
55
Scenario : Offline
66
Mode : PerformanceOnly
7-
Samples per second: 3329.31
7+
Samples per second: 8270.02
88
Result is : VALID
99
Min duration satisfied : Yes
1010
Min queries satisfied : Yes
@@ -13,21 +13,21 @@ Result is : VALID
1313
================================================
1414
Additional Stats
1515
================================================
16-
Min latency (ns) : 1149390628
17-
Max latency (ns) : 667141156558
18-
Mean latency (ns) : 403325233877
19-
50.00 percentile latency (ns) : 429455065877
20-
90.00 percentile latency (ns) : 636160704566
21-
95.00 percentile latency (ns) : 654564440126
22-
97.00 percentile latency (ns) : 660582964130
23-
99.00 percentile latency (ns) : 665358746838
24-
99.90 percentile latency (ns) : 667001952672
16+
Min latency (ns) : 1463771512
17+
Max latency (ns) : 662617361493
18+
Mean latency (ns) : 401715159178
19+
50.00 percentile latency (ns) : 427664906632
20+
90.00 percentile latency (ns) : 631834726524
21+
95.00 percentile latency (ns) : 649993042297
22+
97.00 percentile latency (ns) : 656003034356
23+
99.00 percentile latency (ns) : 660773702230
24+
99.90 percentile latency (ns) : 662470889227
2525

2626
================================================
2727
Test Parameters Used
2828
================================================
29-
samples_per_query : 2221117
30-
target_qps : 3365.33
29+
samples_per_query : 5479858
30+
target_qps : 8302.82
3131
target_latency (ns): 0
3232
max_async_queries : 1
3333
min_duration (ms): 600000

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/verify_accuracy.txt

+3-3
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ Reading performance mode results...
44
num_acc_log_entries = 10833
55
num_acc_log_duplicate_keys = 0
66
num_acc_log_data_mismatch = 0
7-
num_perf_log_entries = 4085
8-
num_perf_log_qsl_idx_match = 4085
9-
num_perf_log_data_mismatch = 51
7+
num_perf_log_entries = 4026
8+
num_perf_log_qsl_idx_match = 4026
9+
num_perf_log_data_mismatch = 20
1010
num_missing_qsl_idxs = 0
1111
TEST FAIL
1212

Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
Verifying performance.
2-
reference score = 3332.01
3-
test score = 3329.31
2+
reference score = 8220.61
3+
test score = 8270.02
44
TEST PASS
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11

2-
hash=8fc90737f9ae1234daaa9f03db7b400fae351bef18b74274522bf220323ed737
2+
hash=53ca0549508a27669a1500e5285aaef99354b5521e1382beac20dc0e89c1aa58

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/singlestream/TEST01/accuracy/mlperf_log_accuracy.json

+5-6
Large diffs are not rendered by default.

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/singlestream/TEST01/performance/run_1/mlperf_log_detail.txt

+92-92
Large diffs are not rendered by default.

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/singlestream/TEST01/performance/run_1/mlperf_log_summary.txt

+17-17
Original file line numberDiff line numberDiff line change
@@ -4,38 +4,38 @@ MLPerf Results Summary
44
SUT name : BERT SERVER
55
Scenario : SingleStream
66
Mode : PerformanceOnly
7-
90th percentile latency (ns) : 2168651
7+
90th percentile latency (ns) : 1028640
88
Result is : VALID
99
Min duration satisfied : Yes
1010
Min queries satisfied : Yes
1111
Early stopping satisfied: Yes
1212
Early Stopping Result:
13-
* Processed at least 64 queries (392387).
14-
* Would discard 38800 highest latency queries.
15-
* Early stopping 90th percentile estimate: 2169583
16-
* Early stopping 99th percentile estimate: 2637852
13+
* Processed at least 64 queries (636833).
14+
* Would discard 63125 highest latency queries.
15+
* Early stopping 90th percentile estimate: 1029130
16+
* Early stopping 99th percentile estimate: 1204811
1717

1818
================================================
1919
Additional Stats
2020
================================================
21-
QPS w/ loadgen overhead : 653.98
22-
QPS w/o loadgen overhead : 656.68
21+
QPS w/ loadgen overhead : 1061.39
22+
QPS w/o loadgen overhead : 1068.05
2323

24-
Min latency (ns) : 1166039
25-
Max latency (ns) : 3143287
26-
Mean latency (ns) : 1522817
27-
50.00 percentile latency (ns) : 1437568
28-
90.00 percentile latency (ns) : 2168651
29-
95.00 percentile latency (ns) : 2321822
30-
97.00 percentile latency (ns) : 2609757
31-
99.00 percentile latency (ns) : 2637407
32-
99.90 percentile latency (ns) : 2661419
24+
Min latency (ns) : 840385
25+
Max latency (ns) : 1404272
26+
Mean latency (ns) : 936287
27+
50.00 percentile latency (ns) : 915873
28+
90.00 percentile latency (ns) : 1028640
29+
95.00 percentile latency (ns) : 1123549
30+
97.00 percentile latency (ns) : 1184400
31+
99.00 percentile latency (ns) : 1204405
32+
99.90 percentile latency (ns) : 1219655
3333

3434
================================================
3535
Test Parameters Used
3636
================================================
3737
samples_per_query : 1
38-
target_qps : 1643.27
38+
target_qps : 2667.11
3939
target_latency (ns): 0
4040
max_async_queries : 1
4141
min_duration (ms): 600000

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/singlestream/TEST01/verify_accuracy.txt

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ Reading performance mode results...
44
num_acc_log_entries = 10833
55
num_acc_log_duplicate_keys = 0
66
num_acc_log_data_mismatch = 0
7-
num_perf_log_entries = 1625
8-
num_perf_log_qsl_idx_match = 1625
7+
num_perf_log_entries = 1666
8+
num_perf_log_qsl_idx_match = 1666
99
num_perf_log_data_mismatch = 0
1010
num_missing_qsl_idxs = 0
1111
TEST PASS
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
Verifying performance.
2-
reference score = 2167999
3-
test score = 2169583
2+
reference score = 1030560
3+
test score = 1029130
44
TEST PASS
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
| Model | Scenario | Accuracy | Throughput | Latency (in ms) | Power Efficiency (in samples/J) | TEST01 |
2-
|--------------|--------------|------------|--------------|-------------------|-----------------------------------|----------|
3-
| 3d-unet-99.9 | singlestream | 0.86236 | 2.326 | 429.856 | | passed |
4-
| 3d-unet-99.9 | offline | 0.86236 | 8.321 | - | | passed |
1+
| Model | Scenario | Accuracy | Throughput | Latency (in ms) | Power Efficiency (in samples/J) | TEST01 |
2+
|---------|--------------|------------|--------------|-------------------|-----------------------------------|----------|
3+
| bert-99 | singlestream | 90.2668 | 969.932 | 1.031 | | passed |
4+
| bert-99 | offline | 90.1528 | 8220.61 | - | | passed |

closed/MLCommons/measurements/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/offline/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Platform: RTX4090x2-nvidia-gpu-TensorRT-default_config
3838
Model Precision: fp16
3939

4040
### Accuracy Results
41-
`F1`: `90.88324`, Required accuracy for closed division `>= 90.78313`
41+
`F1`: `90.15279`, Required accuracy for closed division `>= 89.96526`
4242

4343
### Performance Results
44-
`Samples per second`: `3332.01`
44+
`Samples per second`: `8220.61`

0 commit comments

Comments
 (0)