Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT-J Huggingface validation error #2010

Open
Xi0131 opened this issue Jan 2, 2025 · 1 comment
Open

GPT-J Huggingface validation error #2010

Xi0131 opened this issue Jan 2, 2025 · 1 comment

Comments

@Xi0131
Copy link

Xi0131 commented Jan 2, 2025

Hi, in the final step of this benchmark, the following error occurs.

Finished downloading all the datasets!
[2025-01-02 11:19:46,917 preprocess_data.py:73 INFO] Creating GPT tokenizer...
[2025-01-02 11:19:46,917 preprocess_data.py:39 INFO] Initializing tokenizer from build/models/GPTJ-6B/checkpoint-final
Traceback (most recent call last):
  File "/home/cmuser/.local/lib/python3.8/site-packages/transformers/utils/hub.py", line 389, in cached_file
    resolved_file = hf_hub_download(
  File "/home/cmuser/.local/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/cmuser/.local/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 154, in validate_repo_id
    raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'build/models/GPTJ-6B/checkpoint-final'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/cmuser/CM/repos/local/cache/b1932adfb3014ecd/repo/closed/NVIDIA/code/gptj/tensorrt/preprocess_data.py", line 138, in <module>
    main()
  File "/home/cmuser/CM/repos/local/cache/b1932adfb3014ecd/repo/closed/NVIDIA/code/gptj/tensorrt/preprocess_data.py", line 132, in main
    preprocess_cnndailymail_gptj6b(data_dir, model_dir, preprocessed_data_dir)
  File "/home/cmuser/CM/repos/local/cache/b1932adfb3014ecd/repo/closed/NVIDIA/code/gptj/tensorrt/preprocess_data.py", line 74, in preprocess_cnndailymail_gptj6b
    tokenizer = prepare_tokenizer(ckpt_path, padding_side="right")
  File "/home/cmuser/CM/repos/local/cache/b1932adfb3014ecd/repo/closed/NVIDIA/code/gptj/tensorrt/preprocess_data.py", line 40, in prepare_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
  File "/home/cmuser/.local/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 737, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/cmuser/.local/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 569, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/cmuser/.local/lib/python3.8/site-packages/transformers/utils/hub.py", line 454, in cached_file
    raise EnvironmentError(
OSError: Incorrect path_or_model_id: 'build/models/GPTJ-6B/checkpoint-final'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
make: *** [/home/cmuser/CM/repos/local/cache/b1932adfb3014ecd/repo/closed/NVIDIA/Makefile.data:36: preprocess_data] Error 1

CM error: Portable CM script failed (name = app-mlperf-inference-nvidia, return code = 256)

Here are the command used for the benchmark:

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
   --model=gptj-99 \
   --implementation=nvidia \
   --framework=tensorrt \
   --category=datacenter \
   --scenario=Offline \
   --execution_mode=test \
   --device=cuda  \
   --docker --quiet \
   --test_query_count=50

After the error, I stayed in the docker container.
How do I solve this issue? Will I rerun the whole compiling process once I execute the same command, or can I continue from where it interrupts after solving the error?
Thank you.

@arjunsuresh
Copy link
Contributor

Hi @Xi0131 Is the folder build/models/GPTJ-6B/checkpoint-final having the gptj checkpoint inside the container?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants