Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support MMLU-CF Benchmark #1775

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Conversation

fistyee
Copy link

@fistyee fistyee commented Dec 24, 2024

Motivation

This PR introduces support for the MMLU-CF benchmark. The motivation behind this contribution is to enhance the evaluation of large language models by integrating the MMLU-CF dataset, a contamination-free and more challenging multiple-choice question benchmark. This dataset contains 10K questions each for the validation set and test set, covering 14 disciplines. By implementing this, the goal is to provide a more accurate and fair evaluation, ensuring that the test results are more reliable and transparent, following the improved guidelines set by Microsoft Research.

Modification

The modification in this PR includes integrating the MMLU-CF dataset into the evaluation pipeline. This involves:

  • Ensuring compatibility with existing evaluation frameworks for easy comparison with other benchmarks like MMLU.
  • Implementing the necessary rules and configurations to handle the dataset’s unique validation splits.

BC-breaking (Optional)

No, this modification does not break backward compatibility.

Use cases (Optional)

This PR introduces a new feature by supporting the MMLU-CF benchmark.

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
  • CLA has been signed and all committers have signed the CLA in this PR.

@fistyee
Copy link
Author

fistyee commented Dec 28, 2024

@liushz Hi, can MMLU-CF be merged into the main branch? Are there any further modifications required?

@tonysy
Copy link
Collaborator

tonysy commented Dec 30, 2024

Please fix the lint issue.

README.md Outdated
@@ -57,6 +57,7 @@ Just like a compass guides us on our journey, OpenCompass will guide you through

## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

- **\[2024.12.27\]** We now support the Microsoft's Contamination-Free Multi-task language Understanding Benchmark [MMLU-CF](https://huggingface.co/datasets/microsoft/MMLU-CF). Feel free to give it a try! 🔥🔥🔥
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't recommend this information appear in README now, we will consider adding this in next version of OpenCompass.

README_zh-CN.md Outdated
@@ -57,6 +57,7 @@

## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

- **\[2024.12.27\]** 现已支持Microsoft去污染多任务语言理解数据集[MMLU-CF](https://huggingface.co/datasets/microsoft/MMLU-CF),欢迎尝试! 🔥🔥🔥
Copy link
Collaborator

@liushz liushz Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem as the comment above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

class MMLUCFDataset(BaseDataset):

@staticmethod
def load(path: str, name: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set huggingface as your default loading method, and remove the information in datasets_info.py

@@ -181,6 +181,12 @@
"hf_id": "opencompass/mmlu",
"local": "./data/mmlu/",
},
# MMLU_CF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem as the comment above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants