Add WMT19 dataset configuration and inference code #1503
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
This pull request aims to introduce support for the WMT19 dataset within the opencompass framework. The WMT19 dataset is widely used for machine translation tasks, and by adding it to opencompass, we extend the framework’s coverage for benchmarking large models on translation tasks.
Modification
In this PR, I have added:
wmt19.py
) that defines the dataset loading and pre-processing for the WMT19 dataset.wmt19_gen.py
) that handles the evaluation of models on the WMT19 dataset, ensuring smooth integration with the opencompass benchmarking framework.__init__.py
to upload wmt19 configs.No major changes have been made to the existing framework structure. The new code strictly extends the dataset and inference capabilities without modifying any core components.
BC-breaking (Optional)
This PR does not introduce any backward compatibility issues. The changes are self-contained within the new dataset and inference files and do not affect the existing functionality of the opencompass framework.
Use cases (Optional)
This PR adds the WMT19 dataset to the opencompass framework. It can be used to evaluate large models on machine translation tasks, particularly between multiple language pairs supported by the WMT19 benchmark.
Checklist
Before PR:
After PR: