-
Notifications
You must be signed in to change notification settings - Fork 3.1k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于Qwen2-72B 全量参数微调所需的显卡下限 #4141
Comments
batchsize 是 1 吗,把 cutoff_len 降低一两倍看看能不能跑起来 |
是的,per_device_train_batch_size=1,但是我就是需要packing到Qwen支持的最大长度32768呢,是显卡不够吗,贫穷的原因吗,还是说有什么更不占显存的配置呢~ 这是我的zero3+offload的配置 |
目前可能不太支持这么长序列的微调,后续会增加方法。建议先以 8k 长度训练 |
@hiyouga 后续不知道是否可以参考一下这个repo来改进下长序列微调?https://github.com/jzhang38/EasyContext |
请问大佬我在全量参数sft qwen2-72b的时候,如果只是1000条数据,可以跑起来;但是当用2M+1.1M之后,相同配置就会爆显存,请问大佬有遇到这个问题吗 |
@silvercherry 应该是2M的数据里混入了很长的训练数据。可以分别统计下这1000条和 2M的token长度。把长数据过滤调。 |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Reminder
System Info
Reproduction
如题,
请问全量微调qwen2-72B,训练maxlen为32768,128张A800,80G显存,zero3+offload,这个配置是本来就会爆显存吗,还是说有办法可以跑起来呢。
谢谢~
Expected behavior
No response
Others
No response
The text was updated successfully, but these errors were encountered: