-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
vLLM Windows CUDA support [tested] #2158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
change vllm installed check by transformers utils function
change vllm installed check by transformers utils function
Tested on windows 11, working as intended. |
Oh ok! Great vLLM works on Windows - it's maybe best we also add a print statement showing you can use https://github.com/SystemPanic/vllm-windows! Maybe we should add it in the readme! |
unsloth/models/loader.py
Outdated
from transformers.utils.import_utils import _is_package_available | ||
_vllm_available = _is_package_available("vllm") | ||
if _vllm_available == False: | ||
print("Unsloth: vLLM is not installed! Will use Unsloth inference!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion: can we have this check at the top of loader.py (like where we check transformers versions for model support)
And maybe set it as a constant that we can later reuse in llama.py as well instead of duplicating it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now a new function is_vLLM_available was added to utils.
from transformers.utils.import_utils import _is_package_available | ||
_vllm_available = _is_package_available("vllm") | ||
if _vllm_available == False: | ||
if is_vLLM_available() == False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: General python way is to use if not is_vllm_available()
But this is fine as well...
vLLM Windows CUDA support
vllm-windows Windows wheels by SystemPanic
install
conda create -n vllm python=3.12 conda activate vllm pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 pip install https://github.com/SystemPanic/vllm-windows/releases/download/v0.8.1/vllm-0.8.1+cu124-cp312-cp312-win_amd64.whl pip install https://github.com/SystemPanic/flashinfer-windows/releases/download/v0.2.3/flashinfer_python-0.2.3+cu124torch2.6-cp312-cp312-win_amd64.whl pip install --upgrade pillow pip install --upgrade pandas pip install --upgrade triton-windows pip install grpcio==1.71.0 pip install "unsloth[windows] @ git+https://github.com/fenglui/unsloth.git" pip install --no-deps git+https://github.com/huggingface/transformers.git pip install trl==0.15.2
training test
download https://github.com/unslothai/notebooks/blob/main/nb/Qwen2.5_(3B)-GRPO.ipynb
remove the installation code block
add code block at top
then you can run the rest of the training code.
if the vllm serve faild, add env var "USE_LIBUV" and set value "0" to your windows system var
and that's it, we can run unsloth training with vllm support with Windows.