Results from self hosted Github actions - NVIDIARTX4090

GATEOverflow · Nov 29, 2024 · 4d5765a · 4d5765a
1 parent e70b749
commit 4d5765a
Show file tree

Hide file tree

Showing 35 changed files with 13,705 additions and 0 deletions.
diff --git a/...rements/3d619f8173c8-nvidia_original-gpu-tensorrt-vdefault-scc24-base/README.md b/...rements/3d619f8173c8-nvidia_original-gpu-tensorrt-vdefault-scc24-base/README.md
@@ -0,0 +1,3 @@
+| Model               | Scenario   | Accuracy             |   Throughput | Latency (in ms)   |
+|---------------------|------------|----------------------|--------------|-------------------|
+| stable-diffusion-xl | offline    | (31.26424, 23.38871) |        1.314 | -                 |
diff --git a/...e-diffusion-xl/offline/3d619f8173c8-nvidia_original-gpu-tensorrt-vdefault-scc24-base.json b/...e-diffusion-xl/offline/3d619f8173c8-nvidia_original-gpu-tensorrt-vdefault-scc24-base.json
@@ -0,0 +1,7 @@
+{
+  "starting_weights_filename": "https://github.com/mlcommons/cm4mlops/blob/main/script/get-ml-model-stable-diffusion/_cm.json#L174",
+  "retraining": "no",
+  "input_data_types": "int32",
+  "weight_data_types": "int8",
+  "weight_transformations": "quantization, affine fusion"
+}
diff --git a/...original-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/README.md b/...original-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/README.md
@@ -0,0 +1,104 @@
+This experiment is generated using the [MLCommons Collective Mind automation framework (CM)](https://github.com/mlcommons/cm4mlops).
+
+*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.*
+
+## Host platform
+
+* OS version: Linux-6.2.0-39-generic-x86_64-with-glibc2.29
+* CPU version: x86_64
+* Python version: 3.8.10 (default, Sep 11 2024, 16:02:53) 
+[GCC 9.4.0]
+* MLCommons CM version: 3.3.4
+
+## CM Run Command
+
+See [CM installation guide](https://docs.mlcommons.org/inference/install/).
+
+```bash
+pip install -U cmind
+
+cm rm cache -f
+
+cm pull repo gateoverflow@cm4mlops --checkout=bdeb9213aefedf4ed6c4ab6140466abbb9c2ae4b
+
+cm run script \
+	--tags=app,mlperf,inference,generic,_nvidia,_sdxl,_tensorrt,_test,_r4.1-dev_default,_float16,_offline \
+	--quiet=true \
+	--env.CM_MLPERF_MODEL_SDXL_DOWNLOAD_TO_HOST=yes \
+	--env.CM_QUIET=yes \
+	--env.CM_MLPERF_IMPLEMENTATION=nvidia \
+	--env.CM_MLPERF_MODEL=sdxl \
+	--env.CM_MLPERF_RUN_STYLE=test \
+	--env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=False \
+	--env.CM_DOCKER_PRIVILEGED_MODE=True \
+	--env.CM_MLPERF_BACKEND=tensorrt \
+	--env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter \
+	--env.CM_MLPERF_CLEAN_ALL=True \
+	--env.CM_MLPERF_DEVICE= \
+	--env.CM_MLPERF_USE_DOCKER=True \
+	--env.CM_MLPERF_MODEL_PRECISION=float16 \
+	--env.OUTPUT_BASE_DIR=/home/arjun/scc_gh_action_results \
+	--env.CM_MLPERF_LOADGEN_SCENARIO=Offline \
+	--env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/home/arjun/scc_gh_action_submissions \
+	--env.CM_MLPERF_INFERENCE_VERSION=4.1-dev \
+	--env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1-dev_default \
+	--env.CM_MLPERF_SUBMISSION_DIVISION=open \
+	--env.CM_RUN_MLPERF_SUBMISSION_PREPROCESSOR=False \
+	--env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=short \
+	--env.CM_MLPERF_SUT_NAME_RUN_CONFIG_SUFFIX4=scc24-base \
+	--env.CM_DOCKER_IMAGE_NAME=scc24-nvidia \
+	--env.CM_MLPERF_INFERENCE_MIN_QUERY_COUNT=50 \
+	--env.CM_MLPERF_LOADGEN_ALL_MODES=yes \
+	--env.CM_MLPERF_INFERENCE_SOURCE_VERSION=4.1.23 \
+	--env.CM_MLPERF_LAST_RELEASE=v4.1 \
+	--env.CM_TMP_CURRENT_PATH=/home/arjun/actions-runner/_work/cm4mlops/cm4mlops \
+	--env.CM_TMP_PIP_VERSION_STRING= \
+	--env.CM_MODEL=sdxl \
+	--env.CM_MLPERF_LOADGEN_COMPLIANCE=no \
+	--env.CM_MLPERF_CLEAN_SUBMISSION_DIR=yes \
+	--env.CM_RERUN=yes \
+	--env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= \
+	--env.CM_MLPERF_LOADGEN_MODE=performance \
+	--env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline \
+	--env.CM_MLPERF_LOADGEN_MODES,=performance,accuracy \
+	--env.CM_OUTPUT_FOLDER_NAME=test_results \
+	--env.CM_DOCKER_REUSE_EXISTING_CONTAINER=no \
+	--env.CM_DOCKER_DETACHED_MODE=yes \
+	--add_deps_recursive.get-mlperf-inference-results-dir.tags=_version.r4_1-dev \
+	--add_deps_recursive.get-mlperf-inference-submission-dir.tags=_version.r4_1-dev \
+	--add_deps_recursive.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1-dev \
+	--add_deps_recursive.submission-checker.tags=_short-run \
+	--add_deps_recursive.coco2014-preprocessed.tags=_size.50,_with-sample-ids \
+	--add_deps_recursive.coco2014-dataset.tags=_size.50,_with-sample-ids \
+	--add_deps_recursive.nvidia-preprocess-data.extra_cache_tags=scc24-base \
+	--v=False \
+	--print_env=False \
+	--print_deps=False \
+	--dump_version_info=True \
+	--env.OUTPUT_BASE_DIR=/cm-mount/home/arjun/scc_gh_action_results \
+	--env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/cm-mount/home/arjun/scc_gh_action_submissions \
+	--env.SDXL_CHECKPOINT_PATH=/home/cmuser/CM/repos/local/cache/6be1f30ecbde4c4e/stable_diffusion_fp16 \
+	--env.MLPERF_SCRATCH_PATH=/home/cmuser/CM/repos/local/cache/e066920512fd47b7
+```
+*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts),
+ you should simply reload gateoverflow@cm4mlops without checkout and clean CM cache as follows:*
+
+```bash
+cm rm repo gateoverflow@cm4mlops
+cm pull repo gateoverflow@cm4mlops
+cm rm cache -f
+
+```
+
+## Results
+
+Platform: 3d619f8173c8-nvidia_original-gpu-tensorrt-vdefault-scc24-base
+
+Model Precision: int8
+
+### Accuracy Results 
+`CLIP_SCORE`: `31.26424`, Required accuracy for closed division `>= 31.68632` and `<= 31.81332`
+`FID_SCORE`: `23.38871`, Required accuracy for closed division `>= 23.01086` and `<= 23.95008`
+
+### Performance Results 
+`Samples per second`: `1.31352`
diff --git a/...riginal-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/accuracy_console.out b/...riginal-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/accuracy_console.out
@@ -0,0 +1,73 @@
+[2024-11-28 19:57:08,530 main.py:229 INFO] Detected system ID: KnownSystem.Nvidia_3d619f8173c8
+/home/cmuser/.local/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
+  warnings.warn(_BETA_TRANSFORMS_WARNING)
+/home/cmuser/.local/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
+  warnings.warn(_BETA_TRANSFORMS_WARNING)
+[2024-11-28 19:57:10,113 generate_conf_files.py:107 INFO] Generated measurements/ entries for Nvidia_3d619f8173c8_TRT/stable-diffusion-xl/Offline
+[2024-11-28 19:57:10,113 __init__.py:46 INFO] Running command: python3 -m code.stable-diffusion-xl.tensorrt.harness --logfile_outdir="/cm-mount/home/arjun/scc_gh_action_results/test_results/3d619f8173c8-nvidia_original-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=5000 --test_mode="AccuracyOnly" --gpu_batch_size=2 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/7a9450659dc74b38/inference/mlperf.conf" --tensor_path="build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/" --use_graphs=false --user_conf_path="/home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/4794a67e747b4966ae496ab0bdc2850d.conf" --gpu_inference_streams=1 --gpu_copy_streams=1 --gpu_engines="./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan,./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan,./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan,./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan" --scenario Offline --model stable-diffusion-xl
+[2024-11-28 19:57:10,113 __init__.py:53 INFO] Overriding Environment
+/home/cmuser/.local/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
+  warnings.warn(_BETA_TRANSFORMS_WARNING)
+/home/cmuser/.local/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
+  warnings.warn(_BETA_TRANSFORMS_WARNING)
+[2024-11-28 19:57:12,166 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
+[2024-11-28 19:57:12,310 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
+[2024-11-28 19:57:13,002 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan.
+[2024-11-28 19:57:14,436 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan.
+[2024-11-28 19:57:15,840 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
+[2024-11-28 19:57:15,971 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
+[2024-11-28 19:57:16,652 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan.
+[2024-11-28 19:57:18,044 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan.
+[2024-11-28 19:57:19,263 harness.py:207 INFO] Start Warm Up!
+[2024-11-28 19:57:31,101 harness.py:209 INFO] Warm Up Done!
+[2024-11-28 19:57:31,101 harness.py:211 INFO] Start Test!
+[2024-11-28 20:59:35,861 backend.py:801 INFO] [Server] Received 5000 total samples
+[2024-11-28 20:59:35,862 backend.py:809 INFO] [Device 0] Reported 2496 samples
+[2024-11-28 20:59:35,862 backend.py:809 INFO] [Device 1] Reported 2504 samples
+[2024-11-28 20:59:35,862 harness.py:214 INFO] Test Done!
+[2024-11-28 20:59:35,862 harness.py:216 INFO] Destroying SUT...
+[2024-11-28 20:59:35,862 harness.py:219 INFO] Destroying QSL...
+benchmark : Benchmark.SDXL
+buffer_manager_thread_count : 0
+data_dir : /home/cmuser/CM/repos/local/cache/e066920512fd47b7/data
+gpu_batch_size : 2
+gpu_copy_streams : 1
+gpu_inference_streams : 1
+input_dtype : int32
+input_format : linear
+log_dir : /home/cmuser/CM/repos/local/cache/e0e53f17cf2744e0/repo/closed/NVIDIA/build/logs/2024.11.28-19.57.07
+mlperf_conf_path : /home/cmuser/CM/repos/local/cache/7a9450659dc74b38/inference/mlperf.conf
+model_path : /home/cmuser/CM/repos/local/cache/e066920512fd47b7/models/SDXL/
+offline_expected_qps : 0.0
+precision : int8
+preprocessed_data_dir : /home/cmuser/CM/repos/local/cache/e066920512fd47b7/preprocessed_data
+scenario : Scenario.Offline
+system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='Intel(R) Xeon(R) w7-2495X', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=2): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=197.330052, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=197330052000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1, GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=500.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=1), system_id='Nvidia_3d619f8173c8')
+tensor_path : build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/
+test_mode : AccuracyOnly
+use_graphs : False
+user_conf_path : /home/cmuser/CM/repos/gateoverflow@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/4794a67e747b4966ae496ab0bdc2850d.conf
+system_id : Nvidia_3d619f8173c8
+config_name : Nvidia_3d619f8173c8_stable-diffusion-xl_Offline
+workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
+optimization_level : plugin-enabled
+num_profiles : 1
+config_ver : custom_k_99_MaxP
+accuracy_level : 99%
+inference_server : custom
+skip_file_checks : False
+power_limit : None
+cpu_freq : None
+[I] Loading bytes from ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
+[W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
+[I] Loading bytes from ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/Nvidia_3d619f8173c8/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan
+[2024-11-28 20:59:36,415 run_harness.py:166 INFO] Result: Accuracy run detected.
+
+======================== Result summaries: ========================
+