Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

提供的训练好的模型中为啥有这种差异? #135

Open
GELIELEO opened this issue May 12, 2024 · 1 comment
Open

提供的训练好的模型中为啥有这种差异? #135

GELIELEO opened this issue May 12, 2024 · 1 comment

Comments

@GELIELEO
Copy link

GELIELEO commented May 12, 2024

您好,我看了您提供的训练好的模型,发现每个方法有两份训练好的模型,这两份训练好的模型,一个好像是在2080 ti上训练的,另一个好像是在V100上训的,但是前者只需要200个epoch就能够收敛的很好,后者则需要2000个epoch才能达到同样的收敛效果,这是为什么?

@oduinihao
Copy link

您好,我目前也在使用这个项目,个人认为原因是这个项目使用了学习率调度器,例如Onecycle调度器遵循根据所处epoch的相对进度调整学习率,总体趋势为从小到大再变小。例如设置学习率为1e-3,对于200epoch来说在第100个epoch时学习率接近最大值1e-3。而对于2000epoch的设置,在前200epoch的学习率都非常小,只有在第1000epoch时才会达到1e-3附近。2000epoch应该是为了更充分的进行训练,并非为了对比训练效率。欢迎随时交流!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants