We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel-sft训练完后保存的模型文件有问题,少了配置文件
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory
The text was updated successfully, but these errors were encountered:
Could you please share you env config and training command? I re-run the script and I do not have this issue.
Sorry, something went wrong.
deepspeed 用zero3 以及cpu offload会导致最后保存的问题,我cpo阶段换成zero2就可以正常保存了
No branches or pull requests
parallel-sft训练完后保存的模型文件有问题,少了配置文件
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found
in directory
The text was updated successfully, but these errors were encountered: