-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support MMLU-CF Benchmark #1775
base: main
Are you sure you want to change the base?
Conversation
@liushz Hi, can MMLU-CF be merged into the main branch? Are there any further modifications required? |
Please fix the lint issue. |
README.md
Outdated
@@ -57,6 +57,7 @@ Just like a compass guides us on our journey, OpenCompass will guide you through | |||
|
|||
## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a> | |||
|
|||
- **\[2024.12.27\]** We now support the Microsoft's Contamination-Free Multi-task language Understanding Benchmark [MMLU-CF](https://huggingface.co/datasets/microsoft/MMLU-CF). Feel free to give it a try! 🔥🔥🔥 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't recommend this information appear in README now, we will consider adding this in next version of OpenCompass.
README_zh-CN.md
Outdated
@@ -57,6 +57,7 @@ | |||
|
|||
## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a> | |||
|
|||
- **\[2024.12.27\]** 现已支持Microsoft去污染多任务语言理解数据集[MMLU-CF](https://huggingface.co/datasets/microsoft/MMLU-CF),欢迎尝试! 🔥🔥🔥 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seem as the comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
class MMLUCFDataset(BaseDataset): | ||
|
||
@staticmethod | ||
def load(path: str, name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can set huggingface as your default loading method, and remove the information in datasets_info.py
@@ -181,6 +181,12 @@ | |||
"hf_id": "opencompass/mmlu", | |||
"local": "./data/mmlu/", | |||
}, | |||
# MMLU_CF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seem as the comment above.
Motivation
This PR introduces support for the MMLU-CF benchmark. The motivation behind this contribution is to enhance the evaluation of large language models by integrating the MMLU-CF dataset, a contamination-free and more challenging multiple-choice question benchmark. This dataset contains 10K questions each for the validation set and test set, covering 14 disciplines. By implementing this, the goal is to provide a more accurate and fair evaluation, ensuring that the test results are more reliable and transparent, following the improved guidelines set by Microsoft Research.
Modification
The modification in this PR includes integrating the MMLU-CF dataset into the evaluation pipeline. This involves:
BC-breaking (Optional)
No, this modification does not break backward compatibility.
Use cases (Optional)
This PR introduces a new feature by supporting the MMLU-CF benchmark.
Checklist
Before PR:
After PR: