-
Notifications
You must be signed in to change notification settings - Fork 3k
Implement new translation tasks for google WMT24++ datasets #3480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I have run it but see some warnings: |
@djstrong I think it's not an issue but I fixed this adding explicit aggregation metric in config file. Thanks to this, TER is correctly reported in output table as ↓ (lower is better) |
|
@baberabb May I draw your attention on this PR? I added new set of tools for google's wmt24++ translation datasets. Do you see something missing, can it be merged? Thank you |
|
Maybe let's make some tests without chat templates, so it works well with base models too? |
Ok, I will have to adjust a prompt a bit, to make test working well even without chat template. @baberabb I will come back to you after verification by some models. Thanks. |
Dataset reference: https://huggingface.co/datasets/google/wmt24pp
Paper: https://arxiv.org/abs/2502.12404