Results from self hosted Github actions - NVIDIARTX4090

GATEOverflow · Nov 16, 2024 · da20e16 · da20e16
1 parent 6a2ee78
commit da20e16
Show file tree

Hide file tree

Showing 28 changed files with 716 additions and 527 deletions.
diff --git a/...asurements/arjun_spr-nvidia_original-gpu-tensorrt-vdefault-scc24-base/README.md b/...asurements/arjun_spr-nvidia_original-gpu-tensorrt-vdefault-scc24-base/README.md
@@ -1,3 +1,3 @@
 | Model               | Scenario   | Accuracy              |   Throughput | Latency (in ms)   |
 |---------------------|------------|-----------------------|--------------|-------------------|
-| stable-diffusion-xl | offline    | (16.68664, 233.38096) |        1.139 | -                 |
+| stable-diffusion-xl | offline    | (16.61569, 233.52877) |        1.313 | -                 |
diff --git a/...original-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/README.md b/...original-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/README.md
@@ -19,7 +19,7 @@ pip install -U cmind
 
 cm rm cache -f
 
-cm pull repo gateoverflow@cm4mlops --checkout=0c68370bc2eabd88241b561bf663d9c166eccd20
+cm pull repo gateoverflow@cm4mlops --checkout=4b3d2ee9a56d2f88885ace86b1c8f2e94f91ef41
 
 cm run script \
 	--tags=app,mlperf,inference,generic,_nvidia,_sdxl,_tensorrt,_test,_r4.1-dev_default,_float16,_offline \
@@ -30,6 +30,7 @@ cm run script \
 	--env.CM_MLPERF_MODEL=sdxl \
 	--env.CM_MLPERF_RUN_STYLE=test \
 	--env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=False \
+	--env.CM_DOCKER_PRIVILEGED_MODE=True \
 	--env.CM_MLPERF_BACKEND=tensorrt \
 	--env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter \
 	--env.CM_MLPERF_CLEAN_ALL=True \
@@ -44,7 +45,9 @@ cm run script \
 	--env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=short \
 	--env.CM_MLPERF_SUT_NAME_RUN_CONFIG_SUFFIX4=scc24-base \
 	--env.CM_DOCKER_IMAGE_NAME=scc24-nvidia \
+	--env.CM_MLPERF_INFERENCE_MIN_QUERY_COUNT=50 \
 	--env.CM_MLPERF_LOADGEN_ALL_MODES=yes \
+	--env.CM_MLPERF_INFERENCE_SOURCE_VERSION=4.1.23 \
 	--env.CM_MLPERF_LAST_RELEASE=v4.1 \
 	--env.CM_TMP_CURRENT_PATH=/home/arjun/actions-runner/_work/cm4mlops/cm4mlops \
 	--env.CM_TMP_PIP_VERSION_STRING= \
@@ -92,8 +95,8 @@ Platform: arjun_spr-nvidia_original-gpu-tensorrt-vdefault-scc24-base
 Model Precision: int8
 
 ### Accuracy Results 
-`CLIP_SCORE`: `16.68664`, Required accuracy for closed division `>= 31.68632` and `<= 31.81332`
-`FID_SCORE`: `233.38096`, Required accuracy for closed division `>= 23.01086` and `<= 23.95008`
+`CLIP_SCORE`: `16.61569`, Required accuracy for closed division `>= 31.68632` and `<= 31.81332`
+`FID_SCORE`: `233.52877`, Required accuracy for closed division `>= 23.01086` and `<= 23.95008`
 
 ### Performance Results 
-`Samples per second`: `1.13881`
+`Samples per second`: `1.31313`
diff --git a/...riginal-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/accuracy_console.out b/...riginal-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/accuracy_console.out
@@ -0,0 +1,73 @@
+[2024-11-15 19:46:08,019 main.py:229 INFO] Detected system ID: KnownSystem.c6ef2e4d491e
+/home/cmuser/.local/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
+  warnings.warn(_BETA_TRANSFORMS_WARNING)
+/home/cmuser/.local/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
+  warnings.warn(_BETA_TRANSFORMS_WARNING)
+[2024-11-15 19:46:09,682 generate_conf_files.py:107 INFO] Generated measurements/ entries for c6ef2e4d491e_TRT/stable-diffusion-xl/Offline
+[2024-11-15 19:46:09,683 __init__.py:46 INFO] Running command: python3 -m code.stable-diffusion-xl.tensorrt.harness --logfile_outdir="/cm-mount/home/arjun/scc_gh_action_results/test_results/c6ef2e4d491e-nvidia_original-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=5000 --test_mode="AccuracyOnly" --gpu_batch_size=2 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/0667860eeb4d4b87/inference/mlperf.conf" --tensor_path="build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/" --use_graphs=false --user_conf_path="/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/d134cc547a2a40329987a4e605038f41.conf" --gpu_inference_streams=1 --gpu_copy_streams=1 --gpu_engines="./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan,./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan,./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan,./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan" --scenario Offline --model stable-diffusion-xl
+[2024-11-15 19:46:09,683 __init__.py:53 INFO] Overriding Environment
+/home/cmuser/.local/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
+  warnings.warn(_BETA_TRANSFORMS_WARNING)
+/home/cmuser/.local/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
+  warnings.warn(_BETA_TRANSFORMS_WARNING)
+[2024-11-15 19:46:11,597 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
+[2024-11-15 19:46:11,743 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
+[2024-11-15 19:46:12,466 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan.
+[2024-11-15 19:46:13,873 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan.
+[2024-11-15 19:46:15,272 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
+[2024-11-15 19:46:15,403 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
+[2024-11-15 19:46:16,063 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan.
+[2024-11-15 19:46:17,423 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan.
+[2024-11-15 19:46:18,629 harness.py:207 INFO] Start Warm Up!
+[2024-11-15 19:46:30,486 harness.py:209 INFO] Warm Up Done!
+[2024-11-15 19:46:30,486 harness.py:211 INFO] Start Test!
+[2024-11-15 19:47:08,680 backend.py:801 INFO] [Server] Received 50 total samples
+[2024-11-15 19:47:08,681 backend.py:809 INFO] [Device 0] Reported 26 samples
+[2024-11-15 19:47:08,681 backend.py:809 INFO] [Device 1] Reported 24 samples
+[2024-11-15 19:47:08,681 harness.py:214 INFO] Test Done!
+[2024-11-15 19:47:08,681 harness.py:216 INFO] Destroying SUT...
+[2024-11-15 19:47:08,681 harness.py:219 INFO] Destroying QSL...
+benchmark : Benchmark.SDXL
+buffer_manager_thread_count : 0
+data_dir : /home/cmuser/CM/repos/local/cache/e066920512fd47b7/data
+gpu_batch_size : 2
+gpu_copy_streams : 1
+gpu_inference_streams : 1
+input_dtype : int32
+input_format : linear
+log_dir : /home/cmuser/CM/repos/local/cache/e0e53f17cf2744e0/repo/closed/NVIDIA/build/logs/2024.11.15-19.46.06
+mlperf_conf_path : /home/cmuser/CM/repos/local/cache/0667860eeb4d4b87/inference/mlperf.conf
+model_path : /home/cmuser/CM/repos/local/cache/e066920512fd47b7/models/SDXL/
+offline_expected_qps : 0.0
+precision : int8
+preprocessed_data_dir : /home/cmuser/CM/repos/local/cache/e066920512fd47b7/preprocessed_data
+scenario : Scenario.Offline
+system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='Intel(R) Xeon(R) w7-2495X', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=2): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=197.330052, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=197330052000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1, GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=500.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=1), system_id='c6ef2e4d491e')
+tensor_path : build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/
+test_mode : AccuracyOnly
+use_graphs : False
+user_conf_path : /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/d134cc547a2a40329987a4e605038f41.conf
+system_id : c6ef2e4d491e
+config_name : c6ef2e4d491e_stable-diffusion-xl_Offline
+workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
+optimization_level : plugin-enabled
+num_profiles : 1
+config_ver : custom_k_99_MaxP
+accuracy_level : 99%
+inference_server : custom
+skip_file_checks : False
+power_limit : None
+cpu_freq : None
+[I] Loading bytes from ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
+[W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
+[I] Loading bytes from ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/c6ef2e4d491e/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan
+[2024-11-15 19:47:09,195 run_harness.py:166 INFO] Result: Accuracy run detected.
+
+======================== Result summaries: ========================
+