Skip to content

Commit 15c2c1a

Browse files
committed
Auto-merge updates from auto-update branch
2 parents ca65ab3 + c03944e commit 15c2c1a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+4166
-4153
lines changed

open/MLCommons/measurements/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ pip install -U cmind
1919

2020
cm rm cache -f
2121

22-
cm pull repo mlcommons@mlperf-automations --checkout=467517e4a572872046058e394a0d83512cfff38b
22+
cm pull repo mlcommons@mlperf-automations --checkout=c52956b27fa8d06ec8db53f885e1f05021e379e9
2323

2424
cm run script \
2525
--tags=app,mlperf,inference,generic,_nvidia,_resnet50,_tensorrt,_cuda,_valid,_r4.1-dev_default,_multistream \
@@ -71,7 +71,7 @@ cm run script \
7171
--env.CM_DOCKER_REUSE_EXISTING_CONTAINER=yes \
7272
--env.CM_DOCKER_DETACHED_MODE=yes \
7373
--env.CM_MLPERF_INFERENCE_RESULTS_DIR_=/home/arjun/gh_action_results/valid_results \
74-
--env.CM_DOCKER_CONTAINER_ID=a8f4d29481f7 \
74+
--env.CM_DOCKER_CONTAINER_ID=60e80d607e09 \
7575
--env.CM_MLPERF_LOADGEN_COMPLIANCE_TEST=TEST04 \
7676
--add_deps_recursive.compiler.tags=gcc \
7777
--add_deps_recursive.coco2014-original.tags=_full \
@@ -130,4 +130,4 @@ Model Precision: int8
130130
`acc`: `76.064`, Required accuracy for closed division `>= 75.6954`
131131

132132
### Performance Results
133-
`Samples per query`: `501344.0`
133+
`Samples per query`: `502725.0`

open/MLCommons/measurements/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/accuracy_console.out

+18-18
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
[2024-12-27 23:09:57,077 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2
2-
[2024-12-27 23:09:57,245 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/resnet50/MultiStream
3-
[2024-12-27 23:09:57,245 __init__.py:46 INFO] Running command: ./build/bin/harness_default --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=2048 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=8 --map_path="data_maps/imagenet/val_map.txt" --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf" --tensor_path="build/preprocessed_data/imagenet/ResNet50/int8_linear" --use_graphs=true --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/254e1a0508754bfaa44358cba8270233.conf" --gpu_engines="./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario MultiStream --model resnet50
4-
[2024-12-27 23:09:57,245 __init__.py:53 INFO] Overriding Environment
1+
[2024-12-28 23:25:07,112 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2
2+
[2024-12-28 23:25:07,282 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/resnet50/MultiStream
3+
[2024-12-28 23:25:07,282 __init__.py:46 INFO] Running command: ./build/bin/harness_default --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=2048 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=8 --map_path="data_maps/imagenet/val_map.txt" --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf" --tensor_path="build/preprocessed_data/imagenet/ResNet50/int8_linear" --use_graphs=true --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/1bb3dbed444d4434830a68607ad9af2f.conf" --gpu_engines="./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario MultiStream --model resnet50
4+
[2024-12-28 23:25:07,282 __init__.py:53 INFO] Overriding Environment
55
benchmark : Benchmark.ResNet50
66
buffer_manager_thread_count : 0
77
data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/data
@@ -11,7 +11,7 @@ gpu_copy_streams : 1
1111
gpu_inference_streams : 1
1212
input_dtype : int8
1313
input_format : linear
14-
log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.27-23.09.55
14+
log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.28-23.25.05
1515
map_path : data_maps/imagenet/val_map.txt
1616
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf
1717
multi_stream_expected_latency_ns : 0
@@ -25,7 +25,7 @@ tensor_path : build/preprocessed_data/imagenet/ResNet50/int8_linear
2525
test_mode : AccuracyOnly
2626
use_deque_limit : True
2727
use_graphs : True
28-
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/254e1a0508754bfaa44358cba8270233.conf
28+
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/1bb3dbed444d4434830a68607ad9af2f.conf
2929
system_id : RTX4090x2
3030
config_name : RTX4090x2_resnet50_MultiStream
3131
workload_setting : WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP)
@@ -39,27 +39,27 @@ power_limit : None
3939
cpu_freq : None
4040
&&&& RUNNING Default_Harness # ./build/bin/harness_default
4141
[I] mlperf.conf path: /home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf
42-
[I] user.conf path: /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/254e1a0508754bfaa44358cba8270233.conf
42+
[I] user.conf path: /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/1bb3dbed444d4434830a68607ad9af2f.conf
4343
Creating QSL.
4444
Finished Creating QSL.
4545
Setting up SUT.
4646
[I] [TRT] Loaded engine size: 26 MiB
47-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 77, GPU 837 (MiB)
48-
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 79, GPU 847 (MiB)
47+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 78, GPU 837 (MiB)
48+
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 80, GPU 847 (MiB)
4949
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +24, now: CPU 0, GPU 24 (MiB)
5050
[I] Device:0.GPU: [0] ./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan has been successfully loaded.
5151
[I] [TRT] Loaded engine size: 26 MiB
5252
[W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
53-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 108, GPU 580 (MiB)
54-
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 109, GPU 590 (MiB)
53+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 109, GPU 580 (MiB)
54+
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 110, GPU 590 (MiB)
5555
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +25, now: CPU 0, GPU 49 (MiB)
5656
[I] Device:1.GPU: [0] ./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan has been successfully loaded.
5757
[E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.)
58-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 82, GPU 839 (MiB)
59-
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 82, GPU 847 (MiB)
58+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 83, GPU 839 (MiB)
59+
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 83, GPU 847 (MiB)
6060
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +17, now: CPU 0, GPU 66 (MiB)
61-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 83, GPU 582 (MiB)
62-
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 84, GPU 590 (MiB)
61+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 84, GPU 582 (MiB)
62+
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 85, GPU 590 (MiB)
6363
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +17, now: CPU 0, GPU 83 (MiB)
6464
[I] Start creating CUDA graphs
6565
[I] Capture 8 CUDA graphs
@@ -68,7 +68,7 @@ Setting up SUT.
6868
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
6969
Finished setting up SUT.
7070
Starting warmup. Running for a minimum of 5 seconds.
71-
Finished warmup. Ran for 5.02407s.
71+
Finished warmup. Ran for 5.02451s.
7272
Starting running actual test.
7373

7474
No warnings encountered during test.
@@ -86,8 +86,8 @@ Device Device:1.GPU processed:
8686
PerSampleCudaMemcpy Calls: 0
8787
BatchedCudaMemcpy Calls: 3125
8888
&&&& PASSED Default_Harness # ./build/bin/harness_default
89-
[2024-12-27 23:10:12,779 run_harness.py:166 INFO] Result: Accuracy run detected.
90-
[2024-12-27 23:10:12,779 __init__.py:46 INFO] Running command: python3 /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/inference/vision/classification_and_detection/tools/accuracy-imagenet.py --mlperf-accuracy-file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/accuracy/mlperf_log_accuracy.json --imagenet-val-file data_maps/imagenet/val_map.txt --dtype int32
89+
[2024-12-28 23:25:21,871 run_harness.py:166 INFO] Result: Accuracy run detected.
90+
[2024-12-28 23:25:21,871 __init__.py:46 INFO] Running command: python3 /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/inference/vision/classification_and_detection/tools/accuracy-imagenet.py --mlperf-accuracy-file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/accuracy/mlperf_log_accuracy.json --imagenet-val-file data_maps/imagenet/val_map.txt --dtype int32
9191
accuracy=76.064%, good=38032, total=50000
9292

9393
======================== Result summaries: ========================

0 commit comments

Comments
 (0)