Skip to content

add Sequence Parallelism#6506

Closed
HaoshengZou wants to merge 2437 commits intohiyouga:mainfrom
Qihoo360:sp-pr
Closed

add Sequence Parallelism#6506
HaoshengZou wants to merge 2437 commits intohiyouga:mainfrom
Qihoo360:sp-pr

Conversation

@HaoshengZou
Copy link
Copy Markdown

@HaoshengZou HaoshengZou commented Jan 2, 2025

What does this PR do?

add Sequence Parallelism (#4733 #5024 #5207 #5815 #5841 etc.)
direct plug&play use at https://github.com/Qihoo360/360-LLaMA-Factory

We have a separate README and chat-group at https://github.com/Qihoo360/360-LLaMA-Factory, only for Sequence Parallelism part. They are not to be merged.
We developed based on LLaMA-Factory's latest release v0.9.1. We also based on https://github.com/zhuzilin/ring-flash-attention. The original repos are fully acknowledged.
We developed this at 360. I am PhD from Tsinghua-CS Prof. Jun Zhu's group.

Feel free to review and comment on changes as you see fit. We'll make it better.
Thank you!

Before submitting

@hiyouga
Copy link
Copy Markdown
Owner

hiyouga commented Jan 17, 2025

Hi Haosheng, sorry for the delay in our processing. Recently, we are busy in work and it's quite difficult to merge it. The expected time of finish is before Feb 10th.

@mi-iro
Copy link
Copy Markdown

mi-iro commented Jan 23, 2025

Hi Haosheng, Sequence Parallelism for LoRA is also important. Have you implemented this, or have any plan?

@HaoshengZou
Copy link
Copy Markdown
Author

@mi-iro SP with LoRA is already supported for SFT and DPO.

* Update loader.py to pass in `preprocessing_num_workers`
@shiningliang
Copy link
Copy Markdown

Hi @hiyouga Are there any merge blockers on this PR? I'm SFT qwen2.5 on a long context task and I think sequence parallel will much help to accelerate it.
If I directly use this PR to run before it merged, will some new models run with errors as I notice that this PR is behind some new models supporting PRs.

@HaoshengZou
Copy link
Copy Markdown
Author

@shiningliang This PR diverges from LLaMA-Factory's last release v0.9.1
For now, known errors with SP are with multi-modal data & models. Pure text models should work well.

@shiningliang
Copy link
Copy Markdown

@shiningliang This PR diverges from LLaMA-Factory's last release v0.9.1 For now, known errors with SP are with multi-modal data & models. Pure text models should work well.

Hi @HaoshengZou Thanks for your reply. Do you have plan to support ORPO, KTO, etc.? In my work, we found some scenarios that ORPO performs better than DPO and saves much GPU memory

@HaoshengZou
Copy link
Copy Markdown
Author

@shiningliang In (360-)LLaMA-Factory, ORPO uses the same trainer as DPO. So ORPO should be directly supported and you only need to configure it.

@hiyouga hiyouga mentioned this pull request Feb 12, 2025
@liuqianchao
Copy link
Copy Markdown

liuqianchao commented Feb 22, 2025

Do we have any ETA to finish this PR? Sequence Parallelism is quite important for lots of long context LLM task training.

@hiyouga
Copy link
Copy Markdown
Owner

hiyouga commented Feb 22, 2025

@liuqianchao Sorry we are struggling with the refactoring of trainers in LlamaFactory to support RL training. You can use 360-llama-factory now for long-sequence training.

@githisw
Copy link
Copy Markdown

githisw commented Mar 5, 2025

+1

@Kaimar666
Copy link
Copy Markdown

Does the HUAWEI Ascend 910B support? Or is there any other way?
I want to fine-tune the qwen2.5-14B-Instruct model in SP mode and configure the environment by referring to the 360-llamafactory guide. However, an error is reported in pip install flash-attn.

@hiyouga
Copy link
Copy Markdown
Owner

hiyouga commented Mar 11, 2025

Hello, we just used BFG repo cleaner to remove large files in this repo. Unfortunately, this operation accidentally made all PRs invalid. Could you please recreate the same PRs using the latest main branch at your convenience? Thank you so much for your understanding, and we sincerely apologize for any inconvenience this has brought to you.

P.S. You can set https://github.com/hiyouga/LLaMA-Factory-backup as the upstream to find the changes back.

@Eisenhower
Copy link
Copy Markdown

@shiningliang This PR diverges from LLaMA-Factory's last release v0.9.1 For now, known errors with SP are with multi-modal data & models. Pure text models should work well.
Hi @shiningliang, thanks for the update! I see that SP is fully stable for pure text models but still has known issues with multi-modal data & models. Could you please let me know if VLM training is supported now? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

invalid This doesn't seem right

Projects

None yet

Development

Successfully merging this pull request may close these issues.