[Bug] AssertionError: compatibility of lora and cuda graph and radix attention is in progress #1921

LIUKAI0815 · 2024-11-05T06:54:15Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

Traceback (most recent call last):
File "/root/miniconda3/envs/sglang/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/sglang/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/launch_server.py", line 16, in
raise e
File "/root/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/launch_server.py", line 14, in
launch_server(server_args)
File "/root/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/server.py", line 436, in launch_server
launch_engine(server_args=server_args)
File "/root/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/server.py", line 349, in launch_engine
server_args.check_server_args()
File "/root/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/server_args.py", line 698, in check_server_args
and (self.lora_paths is None or self.disable_radix_cache)
AssertionError: compatibility of lora and cuda graph and radix attention is in progress

Reproduction

export CUDA_VISIBLE_DEVICES=2
export VLLM_USE_MODELSCOPE= False
python -m sglang.launch_server
--model-path ./Qwen2_5-14B-Instruct-AWQ
--port 2015
--host 0.0.0.0
--trust-remote-code
--tensor-parallel-size 1
--quantization awq
--attention-backend flashinfer
--lora-paths role=/workspace/output/role/qwen/qwen2_5-14b-instruct-awq/v1-20241101-133149/checkpoint-1550 \

Environment

sglang 0.3.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] AssertionError: compatibility of lora and cuda graph and radix attention is in progress #1921

[Bug] AssertionError: compatibility of lora and cuda graph and radix attention is in progress #1921

LIUKAI0815 commented Nov 5, 2024

[Bug] AssertionError: compatibility of lora and cuda graph and radix attention is in progress #1921

[Bug] AssertionError: compatibility of lora and cuda graph and radix attention is in progress #1921

Comments

LIUKAI0815 commented Nov 5, 2024

Checklist

Describe the bug

Reproduction

Environment