You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to replicate the finetuning of the QWEN2.5-3B-Instruct model . I tried to evaluate on the mgsm dataset after training but I obtained a performance difference of 14 points between the released model and the model that I trained.
I used the same setup as the one shared in the sft.sh script.
I was wondering if you used a different hyperparameters setup for training the smaller models released, and if you did I'd appreciate if you can share it with me.