Conversation
support llava-next(video)/video-llava
|
Hi Haosheng, sorry for the delay in our processing. Recently, we are busy in work and it's quite difficult to merge it. The expected time of finish is before Feb 10th. |
|
Hi Haosheng, Sequence Parallelism for LoRA is also important. Have you implemented this, or have any plan? |
|
@mi-iro SP with LoRA is already supported for SFT and DPO. |
* Update loader.py to pass in `preprocessing_num_workers`
|
Hi @hiyouga Are there any merge blockers on this PR? I'm SFT qwen2.5 on a long context task and I think sequence parallel will much help to accelerate it. |
|
@shiningliang This PR diverges from LLaMA-Factory's last release v0.9.1 |
Hi @HaoshengZou Thanks for your reply. Do you have plan to support ORPO, KTO, etc.? In my work, we found some scenarios that ORPO performs better than DPO and saves much GPU memory |
|
@shiningliang In (360-)LLaMA-Factory, ORPO uses the same trainer as DPO. So ORPO should be directly supported and you only need to configure it. |
|
Do we have any ETA to finish this PR? Sequence Parallelism is quite important for lots of long context LLM task training. |
|
@liuqianchao Sorry we are struggling with the refactoring of trainers in LlamaFactory to support RL training. You can use 360-llama-factory now for long-sequence training. |
|
+1 |
|
Does the HUAWEI Ascend 910B support? Or is there any other way? |
|
Hello, we just used BFG repo cleaner to remove large files in this repo. Unfortunately, this operation accidentally made all PRs invalid. Could you please recreate the same PRs using the latest main branch at your convenience? Thank you so much for your understanding, and we sincerely apologize for any inconvenience this has brought to you. P.S. You can set https://github.com/hiyouga/LLaMA-Factory-backup as the upstream to find the changes back. |
|
What does this PR do?
add Sequence Parallelism (#4733 #5024 #5207 #5815 #5841 etc.)
direct plug&play use at https://github.com/Qihoo360/360-LLaMA-Factory
We have a separate README and chat-group at https://github.com/Qihoo360/360-LLaMA-Factory, only for Sequence Parallelism part. They are not to be merged.
We developed based on LLaMA-Factory's latest release v0.9.1. We also based on https://github.com/zhuzilin/ring-flash-attention. The original repos are fully acknowledged.
We developed this at 360. I am PhD from Tsinghua-CS Prof. Jun Zhu's group.
Feel free to review and comment on changes as you see fit. We'll make it better.
Thank you!
Before submitting