-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
Deploying the inference-benchmarker on an OpenShift 4.18 cluster fails with OSError: PermissionError.
I am using the helm chart with the following extra/k8s/inference-benchmarker/values.yaml
:
...
model_id: "mistralai/Mistral-7B-Instruct-v0.3"
server: vllm
....
vllm:
enabled: true
$ helm install inference-benchmarker ./extra/k8s/inference-benchmarker
NAME: inference-benchmarker
LAST DEPLOYED: Mon Jul 7 09:12:21 2025
NAMESPACE: blue
STATUS: deployed
REVISION: 1
TEST SUITE: None
I enabled my HF token to access the gated model 'mistralai/Mistral-7B-Instruct-v0.3'.
Pod: inference-benchmarker-5dcf598989-k2bhw - CrashLoopBackOff
INFO 07-07 07:35:34 [__init__.py:244] Automatically detected platform cuda.
INFO 07-07 07:35:37 [api_server.py:1287] vLLM API server version 0.9.1
INFO 07-07 07:35:37 [cli_args.py:309] non-default args: {'port': 8080, 'model': 'mistralai/Mistral-7B-Instruct-v0.3'}
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py", line 470, in cached_files
hf_hub_download(
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 1008, in hf_hub_download
return _hf_hub_download_to_cache_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/huggingface_hub/file_download.py", line 1125, in _hf_hub_download_to_cache_dir
os.makedirs(os.path.dirname(blob_path), exist_ok=True)
File "<frozen os>", line 215, in makedirs
File "<frozen os>", line 215, in makedirs
File "<frozen os>", line 215, in makedirs
[Previous line repeated 1 more time]
File "<frozen os>", line 225, in makedirs
PermissionError: [Errno 13] Permission denied: '/.cache'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1387, in <module>
uvloop.run(run_server(args))
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1323, in run_server
await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1343, in run_server_worker
async with build_async_engine_client(args, client_config) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 155, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 177, in build_async_engine_client_from_engine_args
vllm_config = engine_args.create_engine_config(usage_context=usage_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1018, in create_engine_config
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 910, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 120, in __init__
s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 528, in __post_init__
hf_config = get_config(self.hf_config_path or self.model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py", line 321, in get_config
config_dict, _ = PretrainedConfig.get_config_dict(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py", line 595, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py", line 654, in _get_config_dict
resolved_config_file = cached_file(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py", line 312, in cached_file
file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/utils/hub.py", line 515, in cached_files
raise OSError(
OSError: PermissionError at /.cache when downloading mistralai/Mistral-7B-Instruct-v0.3. Check cache directory permissions. Common causes: 1) another user is downloading the same model (please wait); 2) a previous download was canceled and the lock file needs manual removal.
Metadata
Metadata
Assignees
Labels
No labels