Skip to content

Commit 73648aa

Browse files
committed
Results from GH action on NVIDIA_RTX4090x2
1 parent bb466a3 commit 73648aa

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+20502
-21411
lines changed

open/MLCommons/measurements/RTX4090x2-nvidia-gpu-TensorRT-default_config/retinanet/multistream/README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Host platform
44

5-
* OS version: Linux-6.8.0-51-generic-x86_64-with-glibc2.29
5+
* OS version: Linux-6.8.0-52-generic-x86_64-with-glibc2.29
66
* CPU version: x86_64
77
* Python version: 3.8.10 (default, Jan 17 2025, 14:40:23)
88
[GCC 9.4.0]
@@ -17,7 +17,7 @@ pip install -U mlcflow
1717

1818
mlc rm cache -f
1919

20-
mlc pull repo mlcommons@mlperf-automations --checkout=7f1550ac1c2f254c951802093923a3c1423f7b86
20+
mlc pull repo mlcommons@mlperf-automations --checkout=03d9201c1c9305c7c3eaa0262984af76c7f2287f
2121

2222

2323
```
@@ -40,4 +40,4 @@ Model Precision: int8
4040
### Accuracy Results
4141

4242
### Performance Results
43-
`Samples per query`: `5656717.0`
43+
`Samples per query`: `5646056.0`
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
[2025-01-29 23:50:31,220 main.py:229 INFO] Detected system ID: KnownSystem.e7bff0656085
2-
[2025-01-29 23:50:31,303 harness.py:249 INFO] The harness will load 2 plugins: ['build/plugins/NMSOptPlugin/libnmsoptplugin.so', 'build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so']
3-
[2025-01-29 23:50:31,304 generate_conf_files.py:107 INFO] Generated measurements/ entries for e7bff0656085_TRT/retinanet/MultiStream
4-
[2025-01-29 23:50:31,304 __init__.py:46 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/multistream/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=64 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=2 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/home/mlcuser/MLC/repos/local/cache/get-git-repo_4cd85a18/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=true --user_conf_path="/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/22a091d0057b427a93d508033599dd79.conf" --gpu_engines="./build/engines/e7bff0656085/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario MultiStream --model retinanet --response_postprocess openimageeffnms
5-
[2025-01-29 23:50:31,304 __init__.py:53 INFO] Overriding Environment
1+
[2025-02-02 02:07:19,844 main.py:229 INFO] Detected system ID: KnownSystem.dd805e2fec5f
2+
[2025-02-02 02:07:19,926 harness.py:249 INFO] The harness will load 2 plugins: ['build/plugins/NMSOptPlugin/libnmsoptplugin.so', 'build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so']
3+
[2025-02-02 02:07:19,927 generate_conf_files.py:107 INFO] Generated measurements/ entries for dd805e2fec5f_TRT/retinanet/MultiStream
4+
[2025-02-02 02:07:19,927 __init__.py:46 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/multistream/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=64 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=2 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/home/mlcuser/MLC/repos/local/cache/get-git-repo_14157262/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=true --user_conf_path="/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/2daada4f809841509c848f618114c672.conf" --gpu_engines="./build/engines/dd805e2fec5f/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario MultiStream --model retinanet --response_postprocess openimageeffnms
5+
[2025-02-02 02:07:19,927 __init__.py:53 INFO] Overriding Environment
66
benchmark : Benchmark.Retinanet
77
buffer_manager_thread_count : 0
88
data_dir : /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_5aab030f/data
@@ -12,23 +12,23 @@ gpu_copy_streams : 1
1212
gpu_inference_streams : 1
1313
input_dtype : int8
1414
input_format : linear
15-
log_dir : /home/mlcuser/MLC/repos/local/cache/get-git-repo_8953de2a/repo/closed/NVIDIA/build/logs/2025.01.29-23.50.29
15+
log_dir : /home/mlcuser/MLC/repos/local/cache/get-git-repo_0ab377fc/repo/closed/NVIDIA/build/logs/2025.02.02-02.07.18
1616
map_path : data_maps/open-images-v6-mlperf/val_map.txt
17-
mlperf_conf_path : /home/mlcuser/MLC/repos/local/cache/get-git-repo_4cd85a18/inference/mlperf.conf
17+
mlperf_conf_path : /home/mlcuser/MLC/repos/local/cache/get-git-repo_14157262/inference/mlperf.conf
1818
multi_stream_expected_latency_ns : 0
1919
multi_stream_samples_per_query : 8
2020
multi_stream_target_latency_percentile : 99
2121
precision : int8
2222
preprocessed_data_dir : /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_5aab030f/preprocessed_data
2323
scenario : Scenario.MultiStream
24-
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='Intel(R) Xeon(R) w7-2495X', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=2): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=197.33452799999998, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=197334528000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1, GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=500.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=1), system_id='e7bff0656085')
24+
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='Intel(R) Xeon(R) w7-2495X', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=2): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=197.33452799999998, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=197334528000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1, GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=500.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=1), system_id='dd805e2fec5f')
2525
tensor_path : build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear
2626
test_mode : AccuracyOnly
2727
use_deque_limit : True
2828
use_graphs : True
29-
user_conf_path : /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/22a091d0057b427a93d508033599dd79.conf
30-
system_id : e7bff0656085
31-
config_name : e7bff0656085_retinanet_MultiStream
29+
user_conf_path : /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/2daada4f809841509c848f618114c672.conf
30+
system_id : dd805e2fec5f
31+
config_name : dd805e2fec5f_retinanet_MultiStream
3232
workload_setting : WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP)
3333
optimization_level : plugin-enabled
3434
num_profiles : 1
@@ -39,82 +39,82 @@ skip_file_checks : False
3939
power_limit : None
4040
cpu_freq : None
4141
&&&& RUNNING Default_Harness # ./build/bin/harness_default
42-
[I] mlperf.conf path: /home/mlcuser/MLC/repos/local/cache/get-git-repo_4cd85a18/inference/mlperf.conf
43-
[I] user.conf path: /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/22a091d0057b427a93d508033599dd79.conf
42+
[I] mlperf.conf path: /home/mlcuser/MLC/repos/local/cache/get-git-repo_14157262/inference/mlperf.conf
43+
[I] user.conf path: /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/2daada4f809841509c848f618114c672.conf
4444
Creating QSL.
4545
Finished Creating QSL.
4646
Setting up SUT.
4747
[I] [TRT] Loaded engine size: 73 MiB
48-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +7, GPU +10, now: CPU 126, GPU 881 (MiB)
49-
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 127, GPU 891 (MiB)
48+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 125, GPU 881 (MiB)
49+
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 127, GPU 891 (MiB)
5050
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +68, now: CPU 0, GPU 68 (MiB)
51-
[I] Device:0.GPU: [0] ./build/engines/e7bff0656085/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan has been successfully loaded.
51+
[I] Device:0.GPU: [0] ./build/engines/dd805e2fec5f/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan has been successfully loaded.
5252
[I] [TRT] Loaded engine size: 73 MiB
5353
[W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
54-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 160, GPU 625 (MiB)
55-
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 162, GPU 635 (MiB)
54+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 160, GPU 624 (MiB)
55+
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 161, GPU 634 (MiB)
5656
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +69, now: CPU 0, GPU 137 (MiB)
57-
[I] Device:1.GPU: [0] ./build/engines/e7bff0656085/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan has been successfully loaded.
57+
[I] Device:1.GPU: [0] ./build/engines/dd805e2fec5f/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan has been successfully loaded.
5858
[E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.)
59-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 89, GPU 893 (MiB)
59+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 89, GPU 893 (MiB)
6060
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 89, GPU 901 (MiB)
6161
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +1528, now: CPU 1, GPU 1665 (MiB)
62-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 90, GPU 637 (MiB)
63-
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 90, GPU 645 (MiB)
64-
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1528, now: CPU 1, GPU 3193 (MiB)
62+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 89, GPU 636 (MiB)
63+
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 90, GPU 644 (MiB)
64+
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1527, now: CPU 1, GPU 3192 (MiB)
6565
[I] Start creating CUDA graphs
6666
[I] Capture 2 CUDA graphs
6767
[I] Capture 2 CUDA graphs
6868
[I] Finish creating CUDA graphs
6969
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
7070
Finished setting up SUT.
7171
Starting warmup. Running for a minimum of 5 seconds.
72-
Finished warmup. Ran for 5.14365s.
72+
Finished warmup. Ran for 5.14387s.
7373
Starting running actual test.
7474

7575
No warnings encountered during test.
7676

7777
No errors encountered during test.
7878
Finished running actual test.
7979
Device Device:0.GPU processed:
80-
6196 batches of size 2
80+
6204 batches of size 2
8181
Memcpy Calls: 0
8282
PerSampleCudaMemcpy Calls: 0
83-
BatchedCudaMemcpy Calls: 6196
83+
BatchedCudaMemcpy Calls: 6204
8484
Device Device:1.GPU processed:
85-
6196 batches of size 2
85+
6188 batches of size 2
8686
Memcpy Calls: 0
8787
PerSampleCudaMemcpy Calls: 0
88-
BatchedCudaMemcpy Calls: 6196
88+
BatchedCudaMemcpy Calls: 6188
8989
&&&& PASSED Default_Harness # ./build/bin/harness_default
90-
[2025-01-29 23:51:08,313 run_harness.py:166 INFO] Result: Accuracy run detected.
91-
[2025-01-29 23:51:08,313 __init__.py:46 INFO] Running command: python3 /home/mlcuser/MLC/repos/local/cache/get-git-repo_8953de2a/repo/closed/NVIDIA/build/inference/vision/classification_and_detection/tools/accuracy-openimages.py --mlperf-accuracy-file /mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/multistream/accuracy/mlperf_log_accuracy.json --openimages-dir /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_5aab030f/preprocessed_data/open-images-v6-mlperf --output-file build/retinanet-results.json
90+
[2025-02-02 02:07:58,440 run_harness.py:166 INFO] Result: Accuracy run detected.
91+
[2025-02-02 02:07:58,440 __init__.py:46 INFO] Running command: python3 /home/mlcuser/MLC/repos/local/cache/get-git-repo_0ab377fc/repo/closed/NVIDIA/build/inference/vision/classification_and_detection/tools/accuracy-openimages.py --mlperf-accuracy-file /mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/multistream/accuracy/mlperf_log_accuracy.json --openimages-dir /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_5aab030f/preprocessed_data/open-images-v6-mlperf --output-file build/retinanet-results.json
9292
loading annotations into memory...
93-
Done (t=0.50s)
93+
Done (t=0.44s)
9494
creating index...
9595
index created!
9696
Loading and preparing results...
97-
DONE (t=17.48s)
97+
DONE (t=17.83s)
9898
creating index...
9999
index created!
100100
Running per image evaluation...
101101
Evaluate annotation type *bbox*
102-
DONE (t=132.60s).
102+
DONE (t=132.09s).
103103
Accumulating evaluation results...
104-
DONE (t=32.29s).
104+
DONE (t=31.97s).
105105
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.373
106106
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.522
107-
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.404
107+
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.403
108108
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.023
109109
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.125
110110
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.413
111111
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.419
112-
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.598
112+
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.599
113113
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.628
114-
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.082
115-
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.343
116-
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.677
117-
mAP=37.340%
114+
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.083
115+
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.344
116+
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.678
117+
mAP=37.330%
118118

119119
======================== Result summaries: ========================
120120

0 commit comments

Comments
 (0)