Skip to content

Performance Drop on MGSM After Finetuning - QWEN2.5 - 3B #122

@baselmousi

Description

@baselmousi

Hi,

I was trying to replicate the finetuning of the QWEN2.5-3B-Instruct model . I tried to evaluate on the mgsm dataset after training but I obtained a performance difference of 14 points between the released model and the model that I trained.

I used the same setup as the one shared in the sft.sh script.

I was wondering if you used a different hyperparameters setup for training the smaller models released, and if you did I'd appreciate if you can share it with me.

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions