Replies: 1 comment
-
Setting |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am on a cluster where I want to use vllm to serve model. Now, my issue is that I want to be able to set a cache where my model weights get downloaded when hosting with the vllm.entrypoints.openai.api_server
I don't see any CLI argument that supports this.
For context, I want something similar to the --huggingface_hub_cache when using text_generation_launcher on HF's TGI.
I saw mixed comments on vllm's issues around vllm not respecting the default HF_HOME set in the environment. Any pointers?
Beta Was this translation helpful? Give feedback.
All reactions