Build for gptj docker fails #14

ChristinaHsu0115 · 2024-01-22T10:49:34Z

I had experience when running inference 3.0 with 2 of A100 PCIE GPU card
And the gptj model is new on inference 3.1.
follow the below link :
https://github.com/mlcommons/inference_results_v3.1/tree/main/closed/NVIDIA#readme

here are the procedure for your refernce
1: make prebuild to enter the container enviroment
2: make build
3. download gptj dataset
4. download gptj model
5. preprocessed gptj data.
6. create custom config file and modify with correct parameter
7. run gptj benchmark with offline scenarios. I got error message as below:
Does anyone how to fix the problem?

(mlperf) test@mlperf-inference-test-x86-64-7440:/work$ make run RUN_ARGS="--benchmarks=gptj --scenarios=offline"
make[1]: Entering directory '/work'
[2024-01-22 10:34:01,320 main.py:230 INFO] Detected system ID: KnownSystem.K905_A100X2
[2024-01-22 10:34:02,953 generate_engines.py:172 INFO] Building engines for gptj benchmark in Offline scenario...
[01/22/2024-10:34:02] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 43, GPU 874 (MiB)
[01/22/2024-10:34:08] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1957, GPU +346, now: CPU 2105, GPU 1220 (MiB)
[2024-01-22 10:34:09,676 gptj6b.py:103 INFO] Building GPTJ engine in ./build/engines/K905_A100X2/gptj/Offline, use_fp8: False command: python build/TRTLLM/examples/gptj/build.py --dtype=float16 --use_gpt_attention_plugin=float16 --use_gemm_plugin=float16 --max_batch_size=32 --max_input_len=1919 --max_output_len=128 --vocab_size=50401 --max_beam_width=4 --output_dir=./build/engines/K905_A100X2/gptj/Offline --model_dir=build/models/GPTJ-6B/checkpoint-final --enable_context_fmha --enable_two_optimization_profiles
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/work/code/actionhandler/base.py", line 189, in subprocess_target
return self.action_handler.handle()
File "/work/code/actionhandler/generate_engines.py", line 175, in handle
total_engine_build_time += self.build_engine(job)
File "/work/code/actionhandler/generate_engines.py", line 166, in build_engine
builder.build_engines()
File "/work/code/gptj/tensorrt/gptj6b.py", line 115, in build_engines
raise RuntimeError(f"Engine build fails! stderr: {ret.stderr}. See engine log: {stdout_fn} and {stderr_fn}")
RuntimeError: Engine build fails! stderr: [01/22/2024-10:34:10] [TRT-LLM] [I] Loading HF GPTJ model from build/models/GPTJ-6B/checkpoint-final...

Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.71s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.71s/it]
Traceback (most recent call last):
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 460, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 868, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 333, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 464, in load_state_dict
if f.read(7) == "version":
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "build/TRTLLM/examples/gptj/build.py", line 473, in
args = parse_arguments()
File "build/TRTLLM/examples/gptj/build.py", line 146, in parse_arguments
hf_gpt = AutoModelForCausalLM.from_pretrained(args.model_dir)
File "/home/test/.local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 476, in load_state_dict
raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin' at 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
. See engine log: ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stdout and ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stderr
[2024-01-22 10:34:40,406 generate_engines.py:172 INFO] Building engines for gptj benchmark in Offline scenario...
[01/22/2024-10:34:40] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 43, GPU 874 (MiB)
[01/22/2024-10:34:46] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1957, GPU +346, now: CPU 2105, GPU 1220 (MiB)
[2024-01-22 10:34:47,175 gptj6b.py:103 INFO] Building GPTJ engine in ./build/engines/K905_A100X2/gptj/Offline, use_fp8: False command: python build/TRTLLM/examples/gptj/build.py --dtype=float16 --use_gpt_attention_plugin=float16 --use_gemm_plugin=float16 --max_batch_size=32 --max_input_len=1919 --max_output_len=128 --vocab_size=50401 --max_beam_width=4 --output_dir=./build/engines/K905_A100X2/gptj/Offline --model_dir=build/models/GPTJ-6B/checkpoint-final --enable_context_fmha --enable_two_optimization_profiles
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/work/code/actionhandler/base.py", line 189, in subprocess_target
return self.action_handler.handle()
File "/work/code/actionhandler/generate_engines.py", line 175, in handle
total_engine_build_time += self.build_engine(job)
File "/work/code/actionhandler/generate_engines.py", line 166, in build_engine
builder.build_engines()
File "/work/code/gptj/tensorrt/gptj6b.py", line 115, in build_engines
raise RuntimeError(f"Engine build fails! stderr: {ret.stderr}. See engine log: {stdout_fn} and {stderr_fn}")
RuntimeError: Engine build fails! stderr: [01/22/2024-10:34:48] [TRT-LLM] [I] Loading HF GPTJ model from build/models/GPTJ-6B/checkpoint-final...

Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.90s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.90s/it]
Traceback (most recent call last):
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 460, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 868, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 333, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 464, in load_state_dict
if f.read(7) == "version":
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "build/TRTLLM/examples/gptj/build.py", line 473, in
args = parse_arguments()
File "build/TRTLLM/examples/gptj/build.py", line 146, in parse_arguments
hf_gpt = AutoModelForCausalLM.from_pretrained(args.model_dir)
File "/home/test/.local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 476, in load_state_dict
raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin' at 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
. See engine log: ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stdout and ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stderr
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/work/code/main.py", line 232, in
main(main_args, DETECTED_SYSTEM)
File "/work/code/main.py", line 145, in main
dispatch_action(main_args, config_dict, workload_setting)
File "/work/code/main.py", line 203, in dispatch_action
handler.run()
File "/work/code/actionhandler/base.py", line 82, in run
self.handle_failure()
File "/work/code/actionhandler/base.py", line 186, in handle_failure
self.action_handler.handle_failure()
File "/work/code/actionhandler/generate_engines.py", line 183, in handle_failure
raise RuntimeError("Building engines failed!")
RuntimeError: Building engines failed!
make[1]: *** [Makefile:37: generate_engines] Error 1
make[1]: Leaving directory '/work'
make: *** [Makefile:31: run] Error 2
(mlperf) test@mlperf-inference-test-x86-64-7440:/work$

lapp0 · 2024-01-22T22:16:43Z

Try downgrading transformers to 4.36.2

psyhtest · 2024-01-24T10:25:44Z

@ChristinaHsu0115 Please consider renaming the issue. AMD did not submit to v3.1. You are using NVIDIA's code.

/cc @nv-ananjappa @mrmhodak

ChristinaHsu0115 · 2024-01-24T12:46:59Z

@lapp0 Thanks for help.
I dont know how to update transformers to 4.36.2 exactly. It had lots of dependiency with fsspec, tdqm, huggingface....
so i change two step as below:

I download pytorch_model bin file from another site
(Note. make download_model BENCHMARKS="gpt" ->The file of pytorch had split 3 bin. And 2nd of bin file is broken.)
modify the "ignore_mismatched_sizes = True" on pretrained.

So its able to run gptj benchmark.
But I got another problem as below:
Does anyone know how to fix the problem?

(mlperf) jay@mlperf-inference-jay-x86-64-19218:/work$ make run RUN_ARGS="--benchmarks=gptj --scenarios=offline"
make[1]: Entering directory '/work'
[2024-01-24 12:17:41,391 main.py:230 INFO] Detected system ID: KnownSystem.k905_h100_x2
[2024-01-24 12:17:43,151 generate_engines.py:172 INFO] Building engines for gptj benchmark in Offline scenario...
[01/24/2024-12:17:43] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 44, GPU 942 (MiB)
[01/24/2024-12:17:50] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +4333, GPU +1150, now: CPU 4482, GPU 2094 (MiB)
[2024-01-24 12:17:51,765 gptj6b.py:103 INFO] Building GPTJ engine in ./build/engines/k905_h100_x2/gptj/Offline, use_fp8: False command: python build/TRTLLM/examples/gptj/build.py --dtype=float16 --use_gpt_attention_plugin=float16 --use_gemm_plugin=float16 --max_batch_size=32 --max_input_len=1919 --max_output_len=128 --vocab_size=50401 --max_beam_width=4 --output_dir=./build/engines/k905_h100_x2/gptj/Offline --model_dir=build/models/GPTJ-6B/checkpoint-final --enable_context_fmha --enable_two_optimization_profiles
[2024-01-24 12:20:20,141 gptj6b.py:122 INFO] Engine built complete and took 148.37598872184753s. Stored at ./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan
[2024-01-24 12:20:20,141 generate_engines.py:176 INFO] Finished building engines for gptj benchmark in Offline scenario.
Time taken to generate engines: 156.99001169204712 seconds
make[1]: Leaving directory '/work'
make[1]: Entering directory '/work'
[2024-01-24 12:20:25,648 main.py:230 INFO] Detected system ID: KnownSystem.k905_h100_x2
[2024-01-24 12:20:25,751 harness.py:236 INFO] The harness will load 1 plugins: ['build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so']
[2024-01-24 12:20:25,751 generate_conf_files.py:107 INFO] Generated measurements/ entries for k905_h100_x2_TRT/gptj-99/Offline
[2024-01-24 12:20:25,752 init.py:46 INFO] Running command: ./build/bin/harness_gpt --plugins="build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so" --logfile_outdir="/work/build/logs/2024.01.24-12.17.38/k905_h100_x2_TRT/gptj-99/Offline" --logfile_prefix="mlperf_log_" --performance_sample_count=13368 --gpu_batch_size=32 --tensor_path="build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_ids_padded.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/masked_tokens.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_lengths.npy" --use_graphs=false --gpu_inference_streams=1 --gpu_copy_streams=1 --tensor_parallelism=1 --enable_sort=true --num_sort_segments=2 --gpu_engines="./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan" --mlperf_conf_path="build/loadgen-configs/k905_h100_x2_TRT/gptj-99/Offline/mlperf.conf" --user_conf_path="build/loadgen-configs/k905_h100_x2_TRT/gptj-99/Offline/user.conf" --scenario Offline --model gptj
[2024-01-24 12:20:25,752 init.py:53 INFO] Overriding Environment
benchmark : Benchmark.GPTJ
buffer_manager_thread_count : 0
coalesced_tensor : True
data_dir : /home/jay/inference_results_v3.1/closed/NVIDIA/scratch//data
enable_sort : True
gpu_batch_size : 32
gpu_copy_streams : 1
gpu_inference_streams : 1
input_dtype : int32
input_format : linear
log_dir : /work/build/logs/2024.01.24-12.17.38
num_sort_segments : 2
offline_expected_qps : 76
precision : fp16
preprocessed_data_dir : /home/jay/inference_results_v3.1/closed/NVIDIA/scratch//preprocessed_data
scenario : Scenario.Offline
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9654 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=96, threads_per_core=2): 2}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=1.5849335560000002, byte_suffix=<ByteSuffix.TB: (1000, 4)>, _num_bytes=1584933556000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA H100 PCIe', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=79.6474609375, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=85520809984), max_power_limit=350.0, pci_id='0x233110DE', compute_sm=90): 2})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=2), system_id='k905_h100_x2')
tensor_parallelism : 1
tensor_path : build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_ids_padded.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/masked_tokens.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_lengths.npy
use_graphs : False
system_id : k905_h100_x2
config_name : k905_h100_x2_gptj_Offline
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
optimization_level : plugin-enabled
use_cpu : False
use_inferentia : False
num_profiles : 1
config_ver : custom_k_99_MaxP
accuracy_level : 99%
inference_server : custom
skip_file_checks : False
power_limit : None
cpu_freq : None
&&&& RUNNING GPT_HARNESS # ./build/bin/harness_gpt
[I] Loading plugin: build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so
I0124 12:20:26.327747 13788 main_gpt.cc:122] Found 2 GPUs
I0124 12:20:27.282594 13788 gpt_server.cc:215] Loading 1 engine(s)
I0124 12:20:27.282637 13788 gpt_server.cc:218] Engine Path: ./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan
[I] [TRT] Loaded engine size: 11546 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +66, now: CPU 35086, GPU 12554 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +72, now: CPU 35088, GPU 12626 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +11541, now: CPU 0, GPU 11541 (MiB)
[I] [TRT] Loaded engine size: 11546 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +7, GPU +66, now: CPU 23982, GPU 12093 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +72, now: CPU 23983, GPU 12165 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +11541, now: CPU 0, GPU 23082 (MiB)
I0124 12:20:40.118860 13788 gpt_server.cc:290] Engines Deserialization Completed
I0124 12:20:40.366228 13788 gpt_core.cc:64] GPTCore 0: MPI Rank - 0 at Device Id - 0
I0124 12:20:40.366343 13788 gpt_core.cc:262] Engine - Vocab size: 50401 Padded vocab size: 50401 Beam width: 4
I0124 12:20:40.369578 13788 gpt_core.cc:90] Engine - Device Memory requirements: 6539709440
I0124 12:20:40.369586 13788 gpt_core.cc:99] Engine - Total Number of Optimization Profiles: 2
I0124 12:20:40.369588 13788 gpt_core.cc:100] Engine - Number of Optimization Profiles Per Core: 2
I0124 12:20:40.369591 13788 gpt_core.cc:101] Engine - Start Index of Optimization Profiles: 0
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +64, now: CPU 893, GPU 18868 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +64, now: CPU 893, GPU 18932 (MiB)
I0124 12:20:40.602331 13788 gpt_core.cc:115] Setting Opt.Prof. to 0
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 23082 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +64, now: CPU 930, GPU 19032 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +72, now: CPU 930, GPU 19104 (MiB)
I0124 12:20:40.817628 13788 gpt_core.cc:115] Setting Opt.Prof. to 1
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 23082 (MiB)
[I] [TRT] Switching optimization profile from: 0 to 1. Please ensure there are no enqueued operations pending in this context prior to switching profiles
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
[mlperf-inference-jay-x86-64-19218:13788] *** Process received signal ***
[mlperf-inference-jay-x86-64-19218:13788] Signal: Aborted (6)
[mlperf-inference-jay-x86-64-19218:13788] Signal code: (-6)
[mlperf-inference-jay-x86-64-19218:13788] [ 0] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f0c5c775420]
[mlperf-inference-jay-x86-64-19218:13788] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f0c5c26400b]
[mlperf-inference-jay-x86-64-19218:13788] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f0c5c243859]
[mlperf-inference-jay-x86-64-19218:13788] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e8d1)[0x7f0c5c61b8d1]
[mlperf-inference-jay-x86-64-19218:13788] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa37c)[0x7f0c5c62737c]
[mlperf-inference-jay-x86-64-19218:13788] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3e7)[0x7f0c5c6273e7]
[mlperf-inference-jay-x86-64-19218:13788] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(__cxa_rethrow+0x4d)[0x7f0c5c6276ed]
[mlperf-inference-jay-x86-64-19218:13788] [ 7] ./build/bin/harness_gpt(+0x715c1)[0x564f8dfb35c1]
[mlperf-inference-jay-x86-64-19218:13788] [ 8] ./build/bin/harness_gpt(+0x6b45b)[0x564f8dfad45b]
[mlperf-inference-jay-x86-64-19218:13788] [ 9] ./build/bin/harness_gpt(+0x5d0fe)[0x564f8df9f0fe]
[mlperf-inference-jay-x86-64-19218:13788] [10] ./build/bin/harness_gpt(+0x2fc84)[0x564f8df71c84]
[mlperf-inference-jay-x86-64-19218:13788] [11] /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f0c5c245083]
[mlperf-inference-jay-x86-64-19218:13788] [12] ./build/bin/harness_gpt(+0x3074e)[0x564f8df7274e]
[mlperf-inference-jay-x86-64-19218:13788] *** End of error message ***
Aborted (core dumped)
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/work/code/main.py", line 232, in
main(main_args, DETECTED_SYSTEM)
File "/work/code/main.py", line 145, in main
dispatch_action(main_args, config_dict, workload_setting)
File "/work/code/main.py", line 203, in dispatch_action
handler.run()
File "/work/code/actionhandler/base.py", line 82, in run
self.handle_failure()
File "/work/code/actionhandler/run_harness.py", line 193, in handle_failure
raise RuntimeError("Run harness failed!")
RuntimeError: Run harness failed!
Traceback (most recent call last):
File "/work/code/actionhandler/run_harness.py", line 162, in handle
result_data = self.harness.run_harness(flag_dict=self.harness_flag_dict, skip_generate_measurements=True)
File "/work/code/common/harness.py", line 339, in run_harness
output = run_command(self.construct_terminal_command(argstr), get_output=True, custom_env=self.env_vars)
File "/work/code/common/init.py", line 67, in run_command
raise subprocess.CalledProcessError(ret, cmd)
subprocess.CalledProcessError: Command './build/bin/harness_gpt --plugins="build/plugins/../TRTLLM/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin.so" --logfile_outdir="/work/build/logs/2024.01.24-12.17.38/k905_h100_x2_TRT/gptj-99/Offline" --logfile_prefix="mlperf_log" --performance_sample_count=13368 --gpu_batch_size=32 --tensor_path="build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_ids_padded.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/masked_tokens.npy,build/preprocessed_data/cnn_dailymail_tokenized_gptj/input_lengths.npy" --use_graphs=false --gpu_inference_streams=1 --gpu_copy_streams=1 --tensor_parallelism=1 --enable_sort=true --num_sort_segments=2 --gpu_engines="./build/engines/k905_h100_x2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.plan" --mlperf_conf_path="build/loadgen-configs/k905_h100_x2_TRT/gptj-99/Offline/mlperf.conf" --user_conf_path="build/loadgen-configs/k905_h100_x2_TRT/gptj-99/Offline/user.conf" --scenario Offline --model gptj' returned non-zero exit status 134.
make[1]: *** [Makefile:45: run_harness] Error 1
make[1]: Leaving directory '/work'
make: *** [Makefile:32: run] Error 2
(mlperf) jay@mlperf-inference-jay-x86-64-19218:/work$

ChristinaHsu0115 · 2024-01-25T09:40:40Z

The issue had been solved when modifed the parameter gpu_batch_ size parameter on custom.py. Its able to run gptj benchmark. Thanks to all.

ChristinaHsu0115 changed the title ~~Build for AMD gptj docker fails~~ Build for gptj docker fails Jan 24, 2024

ChristinaHsu0115 closed this as completed Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build for gptj docker fails #14

Build for gptj docker fails #14

ChristinaHsu0115 commented Jan 22, 2024 •

edited

Loading

lapp0 commented Jan 22, 2024

psyhtest commented Jan 24, 2024

ChristinaHsu0115 commented Jan 24, 2024

ChristinaHsu0115 commented Jan 25, 2024

Build for gptj docker fails #14

Build for gptj docker fails #14

Comments

ChristinaHsu0115 commented Jan 22, 2024 • edited Loading

lapp0 commented Jan 22, 2024

psyhtest commented Jan 24, 2024

ChristinaHsu0115 commented Jan 24, 2024

ChristinaHsu0115 commented Jan 25, 2024

ChristinaHsu0115 commented Jan 22, 2024 •

edited

Loading