Skip to content

Commit 37f9603

Browse files
committed
Results from GH action on NVIDIA_RTX4090x2
1 parent e904988 commit 37f9603

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+1389
-1389
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11

2-
hash=ea3f356fca2cfea2b93131598739033cf0afa4f030ff951822a60acb2c787d7c
2+
hash=d6d57dfc881e436890fbe7b248bba63650116117012f5b4a0628346481f4ebdb

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/accuracy/baseline_accuracy.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
{"exact_match": 25.97918637653737, "f1": 28.36486682551724}
1+
{"exact_match": 25.76158940397351, "f1": 27.99745717265541}
22
Reading examples...
33
No cached features at 'eval_features.pickle'... converting from examples...
44
Creating tokenizer...

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/accuracy/compliance_accuracy.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
{"exact_match": 25.97918637653737, "f1": 28.36486682551724}
1+
{"exact_match": 25.75212866603595, "f1": 27.995565025067897}
22
Reading examples...
33
Loading cached features from 'eval_features.pickle'...
44
Loading LoadGen logs...

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/accuracy/mlperf_log_accuracy.json

+13-13
Large diffs are not rendered by default.

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/performance/run_1/mlperf_log_detail.txt

+88-88
Large diffs are not rendered by default.

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/performance/run_1/mlperf_log_summary.txt

+12-12
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ MLPerf Results Summary
44
SUT name : BERT SERVER
55
Scenario : Offline
66
Mode : PerformanceOnly
7-
Samples per second: 3336.41
7+
Samples per second: 8249
88
Result is : VALID
99
Min duration satisfied : Yes
1010
Min queries satisfied : Yes
@@ -13,21 +13,21 @@ Result is : VALID
1313
================================================
1414
Additional Stats
1515
================================================
16-
Min latency (ns) : 1106035681
17-
Max latency (ns) : 666942028563
18-
Mean latency (ns) : 403174469057
19-
50.00 percentile latency (ns) : 429281320374
20-
90.00 percentile latency (ns) : 635878977169
21-
95.00 percentile latency (ns) : 654301430601
22-
97.00 percentile latency (ns) : 660400377264
23-
99.00 percentile latency (ns) : 665211612256
24-
99.90 percentile latency (ns) : 666796267079
16+
Min latency (ns) : 1470245857
17+
Max latency (ns) : 667411550599
18+
Mean latency (ns) : 404693085129
19+
50.00 percentile latency (ns) : 430908471157
20+
90.00 percentile latency (ns) : 636414697498
21+
95.00 percentile latency (ns) : 654750714519
22+
97.00 percentile latency (ns) : 660774538506
23+
99.00 percentile latency (ns) : 665604516280
24+
99.90 percentile latency (ns) : 667266850144
2525

2626
================================================
2727
Test Parameters Used
2828
================================================
29-
samples_per_query : 2225190
30-
target_qps : 3371.5
29+
samples_per_query : 5505476
30+
target_qps : 8341.63
3131
target_latency (ns): 0
3232
max_async_queries : 1
3333
min_duration (ms): 600000

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/offline/TEST01/verify_accuracy.txt

+3-3
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ Reading performance mode results...
44
num_acc_log_entries = 10833
55
num_acc_log_duplicate_keys = 0
66
num_acc_log_data_mismatch = 0
7-
num_perf_log_entries = 4085
8-
num_perf_log_qsl_idx_match = 4085
9-
num_perf_log_data_mismatch = 51
7+
num_perf_log_entries = 4019
8+
num_perf_log_qsl_idx_match = 4019
9+
num_perf_log_data_mismatch = 22
1010
num_missing_qsl_idxs = 0
1111
TEST FAIL
1212

Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
Verifying performance.
2-
reference score = 3338.12
3-
test score = 3336.41
2+
reference score = 8259.04
3+
test score = 8249
44
TEST PASS
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11

2-
hash=66666b875ec9add3cf67a97dcd7ec698343c18f39b71534a0bbbb93a5d1fbea5
2+
hash=aded1916a0bd347aa82fd4cb88eeda19fcca051529aa6f5667e44d1fa35ec704

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/singlestream/TEST01/accuracy/mlperf_log_accuracy.json

+6-6
Large diffs are not rendered by default.

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/singlestream/TEST01/performance/run_1/mlperf_log_detail.txt

+92-92
Large diffs are not rendered by default.

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/singlestream/TEST01/performance/run_1/mlperf_log_summary.txt

+17-17
Original file line numberDiff line numberDiff line change
@@ -4,38 +4,38 @@ MLPerf Results Summary
44
SUT name : BERT SERVER
55
Scenario : SingleStream
66
Mode : PerformanceOnly
7-
90th percentile latency (ns) : 2175885
7+
90th percentile latency (ns) : 1032715
88
Result is : VALID
99
Min duration satisfied : Yes
1010
Min queries satisfied : Yes
1111
Early stopping satisfied: Yes
1212
Early Stopping Result:
13-
* Processed at least 64 queries (390974).
14-
* Would discard 38660 highest latency queries.
15-
* Early stopping 90th percentile estimate: 2176629
16-
* Early stopping 99th percentile estimate: 2639196
13+
* Processed at least 64 queries (633200).
14+
* Would discard 62763 highest latency queries.
15+
* Early stopping 90th percentile estimate: 1033384
16+
* Early stopping 99th percentile estimate: 1212429
1717

1818
================================================
1919
Additional Stats
2020
================================================
21-
QPS w/ loadgen overhead : 651.62
22-
QPS w/o loadgen overhead : 654.05
21+
QPS w/ loadgen overhead : 1055.33
22+
QPS w/o loadgen overhead : 1061.80
2323

24-
Min latency (ns) : 1166035
25-
Max latency (ns) : 8220655
26-
Mean latency (ns) : 1528935
27-
50.00 percentile latency (ns) : 1444284
28-
90.00 percentile latency (ns) : 2175885
29-
95.00 percentile latency (ns) : 2364222
30-
97.00 percentile latency (ns) : 2611718
31-
99.00 percentile latency (ns) : 2638638
32-
99.90 percentile latency (ns) : 2662118
24+
Min latency (ns) : 852823
25+
Max latency (ns) : 6205619
26+
Mean latency (ns) : 941799
27+
50.00 percentile latency (ns) : 923597
28+
90.00 percentile latency (ns) : 1032715
29+
95.00 percentile latency (ns) : 1140001
30+
97.00 percentile latency (ns) : 1190451
31+
99.00 percentile latency (ns) : 1212173
32+
99.90 percentile latency (ns) : 1223493
3333

3434
================================================
3535
Test Parameters Used
3636
================================================
3737
samples_per_query : 1
38-
target_qps : 1635.21
38+
target_qps : 2664.61
3939
target_latency (ns): 0
4040
max_async_queries : 1
4141
min_duration (ms): 600000

closed/MLCommons/compliance/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/singlestream/TEST01/verify_accuracy.txt

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ Reading performance mode results...
44
num_acc_log_entries = 10833
55
num_acc_log_duplicate_keys = 0
66
num_acc_log_data_mismatch = 0
7-
num_perf_log_entries = 1626
8-
num_perf_log_qsl_idx_match = 1626
7+
num_perf_log_entries = 1656
8+
num_perf_log_qsl_idx_match = 1656
99
num_perf_log_data_mismatch = 0
1010
num_missing_qsl_idxs = 0
1111
TEST PASS
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
Verifying performance.
2-
reference score = 2176777
3-
test score = 2176629
2+
reference score = 1030312
3+
test score = 1033384
44
TEST PASS
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
| Model | Scenario | Accuracy | Throughput | Latency (in ms) | Power Efficiency (in samples/J) | TEST01 |
2-
|--------------|--------------|------------|--------------|-------------------|-----------------------------------|----------|
3-
| 3d-unet-99.9 | singlestream | 0.86236 | 2.32 | 431.066 | | passed |
4-
| 3d-unet-99.9 | offline | 0.86236 | 8.318 | - | | passed |
1+
| Model | Scenario | Accuracy | Throughput | Latency (in ms) | Power Efficiency (in samples/J) | TEST01 |
2+
|---------|--------------|------------|--------------|-------------------|-----------------------------------|----------|
3+
| bert-99 | singlestream | 90.2668 | 970.874 | 1.03 | | passed |
4+
| bert-99 | offline | 90.1528 | 8259.04 | - | | passed |

closed/MLCommons/measurements/RTX4090x2-nvidia-gpu-TensorRT-default_config/bert-99/offline/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Platform: RTX4090x2-nvidia-gpu-TensorRT-default_config
3838
Model Precision: fp16
3939

4040
### Accuracy Results
41-
`F1`: `90.88324`, Required accuracy for closed division `>= 90.78313`
41+
`F1`: `90.15279`, Required accuracy for closed division `>= 89.96526`
4242

4343
### Performance Results
44-
`Samples per second`: `3338.12`
44+
`Samples per second`: `8259.04`

0 commit comments

Comments
 (0)