[Do Not Merge] - LoRA V1 Reference PR #11613

varun-sundar-rabindranath · 2024-12-30T04:51:33Z

LoRA works end-to-end with this PR. However, this PR introduces a lot of changes that (some unnecessary) need to split up into smaller PRs. Putting up this PR as a reference for sub PRs.

Benchmarks:
Machine - 1xA100

Command:

VLLM_USE_V1="1" python3 benchmarks/benchmark_throughput.py --model  meta-llama/Llama-2-7b-hf --backend vllm   --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts 1000  --max-loras 4  --max-lora-rank 8 --enable-lora --lora-path "yard1/llama-2-7b-sql-lora-test" -O 3

Throughput: 9.27 requests/s, 4462.35 total tokens/s, 2181.49 output tokens/s

python3 benchmarks/benchmark_throughput.py --model  meta-llama/Llama-2-7b-hf --backend vllm   --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts 1000  --max-loras 4  --max-lora-rank 8 --enable-lora --lora-path "yard1/llama-2-7b-sql-lora-test" --num-scheduler-steps 8

Throughput: 9.60 requests/s, 4617.92 total tokens/s, 2257.54 output tokens/s

Plan for PR split:

Base PR (changes to gpu_model_runner.py, v1 scheduler, and prefix caching) - [V1] LoRA Support #10957
Changes to support torch.compile for LoRA
Add LoRA kernels for V1
Add and Test add/remove/pin LoRA functions for runtime Load/Unload of LoRAs.

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

github-actions · 2024-12-30T04:51:43Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

mergify · 2024-12-30T04:52:11Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @varun-sundar-rabindranath.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Varun Sundar Rabindranath added 12 commits December 29, 2024 15:29

Add lora support

78737aa

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

lora id for prefix caching

e5b4087

remove comment

4a5b550

limit cudagraph capture size to max_num_seqs

80ff344

remove torch compile comment

dee4001

format

dc11242

lora_expand opt changes

83339bd

lora_shrink opt changes

bfc51e6

v1_gpu changes to pass in lora ids

9b643c6

fix tests

ac6e926

format

97f2134

fix fake functions

8ff67c5

varun-sundar-rabindranath requested review from WoosukKwon, robertgshaw2-neuralmagic, njhill, ywang96, comaniac and alexm-neuralmagic as code owners December 30, 2024 04:51

varun-sundar-rabindranath marked this pull request as draft December 30, 2024 04:51

mergify bot added the needs-rebase label Dec 30, 2024

varun-sundar-rabindranath mentioned this pull request Dec 31, 2024

[V1] LoRA Support #10957

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Do Not Merge] - LoRA V1 Reference PR #11613

[Do Not Merge] - LoRA V1 Reference PR #11613

varun-sundar-rabindranath commented Dec 30, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 30, 2024

mergify bot commented Dec 30, 2024

[Do Not Merge] - LoRA V1 Reference PR #11613

Are you sure you want to change the base?

[Do Not Merge] - LoRA V1 Reference PR #11613

Conversation

varun-sundar-rabindranath commented Dec 30, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 30, 2024

mergify bot commented Dec 30, 2024

varun-sundar-rabindranath commented Dec 30, 2024 •

edited by github-actions bot

Loading