Skip to content

vLLM Windows CUDA support [tested] #2158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

fenglui
Copy link

@fenglui fenglui commented Mar 23, 2025

vLLM Windows CUDA support

vllm-windows Windows wheels by SystemPanic

install

conda create -n vllm python=3.12
conda activate vllm
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install https://github.com/SystemPanic/vllm-windows/releases/download/v0.8.1/vllm-0.8.1+cu124-cp312-cp312-win_amd64.whl
pip install https://github.com/SystemPanic/flashinfer-windows/releases/download/v0.2.3/flashinfer_python-0.2.3+cu124torch2.6-cp312-cp312-win_amd64.whl
pip install --upgrade pillow
pip install --upgrade pandas
pip install --upgrade triton-windows
pip install grpcio==1.71.0
pip install "unsloth[windows] @ git+https://github.com/fenglui/unsloth.git"
pip install --no-deps git+https://github.com/huggingface/transformers.git
pip install trl==0.15.2

training test

download https://github.com/unslothai/notebooks/blob/main/nb/Qwen2.5_(3B)-GRPO.ipynb

remove the installation code block

add code block at top

import os
os.environ['UNSLOTH_DISABLE_AUTO_UPDATES'] = '1'
os.environ["VLLM_USE_V1"] = "0"
os.environ["VLLM_ATTENTION_BACKEND"] = "FLASHINFER"

# Disable libuv on Windows by default
os.environ["USE_LIBUV"] = os.environ.get("USE_LIBUV", "0")

then you can run the rest of the training code.

if the vllm serve faild, add env var "USE_LIBUV" and set value "0" to your windows system var

and that's it, we can run unsloth training with vllm support with Windows.

fenglui added 2 commits March 23, 2025 10:04
change vllm installed check by transformers utils function
change vllm installed check by transformers utils function
@void-mckenzie
Copy link
Contributor

Tested on windows 11, working as intended.

@danielhanchen
Copy link
Contributor

Oh ok! Great vLLM works on Windows - it's maybe best we also add a print statement showing you can use https://github.com/SystemPanic/vllm-windows! Maybe we should add it in the readme!

Comment on lines 342 to 345
from transformers.utils.import_utils import _is_package_available
_vllm_available = _is_package_available("vllm")
if _vllm_available == False:
print("Unsloth: vLLM is not installed! Will use Unsloth inference!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion: can we have this check at the top of loader.py (like where we check transformers versions for model support)
And maybe set it as a constant that we can later reuse in llama.py as well instead of duplicating it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think so.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now a new function is_vLLM_available was added to utils.

from transformers.utils.import_utils import _is_package_available
_vllm_available = _is_package_available("vllm")
if _vllm_available == False:
if is_vLLM_available() == False:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: General python way is to use if not is_vllm_available()
But this is fine as well...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants