[Feature] Support MMLU-CF Benchmark #1775

fistyee · 2024-12-24T03:56:19Z

Motivation

This PR introduces support for the MMLU-CF benchmark. The motivation behind this contribution is to enhance the evaluation of large language models by integrating the MMLU-CF dataset, a contamination-free and more challenging multiple-choice question benchmark. This dataset contains 10K questions each for the validation set and test set, covering 14 disciplines. By implementing this, the goal is to provide a more accurate and fair evaluation, ensuring that the test results are more reliable and transparent, following the improved guidelines set by Microsoft Research.

Modification

The modification in this PR includes integrating the MMLU-CF dataset into the evaluation pipeline. This involves:

Ensuring compatibility with existing evaluation frameworks for easy comparison with other benchmarks like MMLU.
Implementing the necessary rules and configurations to handle the dataset’s unique validation splits.

BC-breaking (Optional)

No, this modification does not break backward compatibility.

Use cases (Optional)

This PR introduces a new feature by supporting the MMLU-CF benchmark.

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
CLA has been signed and all committers have signed the CLA in this PR.

opencompass/configs/dataset_collections/chat_OC15.py

configs/summarizers/example.py

configs/eval_corebench_2409_base_objective.py

fistyee · 2024-12-28T09:01:17Z

@liushz Hi, can MMLU-CF be merged into the main branch? Are there any further modifications required?

tonysy · 2024-12-30T04:06:25Z

Please fix the lint issue.

liushz · 2024-12-30T05:57:54Z

README.md

@@ -57,6 +57,7 @@ Just like a compass guides us on our journey, OpenCompass will guide you through

 ## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

+- **\[2024.12.27\]** We now support the Microsoft's Contamination-Free Multi-task language Understanding Benchmark [MMLU-CF](https://huggingface.co/datasets/microsoft/MMLU-CF). Feel free to give it a try! 🔥🔥🔥


We don't recommend this information appear in README now, we will consider adding this in next version of OpenCompass.

liushz · 2024-12-30T05:58:37Z

README_zh-CN.md

@@ -57,6 +57,7 @@

 ## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

+- **\[2024.12.27\]** 现已支持Microsoft去污染多任务语言理解数据集[MMLU-CF](https://huggingface.co/datasets/microsoft/MMLU-CF)，欢迎尝试! 🔥🔥🔥


Seem as the comment above.

liushz · 2024-12-30T06:28:51Z

opencompass/datasets/mmlu_cf.py

+class MMLUCFDataset(BaseDataset):
+
+    @staticmethod
+    def load(path: str, name: str):


You can set huggingface as your default loading method, and remove the information in datasets_info.py

liushz · 2024-12-30T06:30:00Z

opencompass/utils/datasets_info.py

@@ -181,6 +181,12 @@
        "hf_id": "opencompass/mmlu",
        "local": "./data/mmlu/",
    },
+    # MMLU_CF


Seem as the comment above.

fistyee added 4 commits December 24, 2024 10:19

[Feature] Support MMLU-CF Benchmark

d1a9db5

[Feature] Support MMLU-CF Benchmark

0c48407

[Feature] Support MMLU-CF Benchmark

17af07e

[Feature] Support MMLU-CF Benchmark

772b9a7

mm-assistant bot assigned liushz Dec 24, 2024

fistyee temporarily deployed to prod December 24, 2024 04:14 — with GitHub Actions Inactive

fistyee added 2 commits December 24, 2024 13:44

[Feature] Support MMLU-CF Benchmark

a32a1ee

[Feature] Support MMLU-CF Benchmark

531945d

tonysy requested review from MaiziXiao and liushz December 24, 2024 07:54

fistyee temporarily deployed to prod December 24, 2024 07:56 — with GitHub Actions Inactive

liushz reviewed Dec 24, 2024

View reviewed changes

opencompass/configs/dataset_collections/chat_OC15.py Outdated Show resolved Hide resolved

liushz reviewed Dec 24, 2024

View reviewed changes

configs/summarizers/example.py Outdated Show resolved Hide resolved

liushz reviewed Dec 24, 2024

View reviewed changes

configs/eval_corebench_2409_base_objective.py Outdated Show resolved Hide resolved

[Feature] Support MMLU-CF Benchmark

3d769a9

fistyee had a problem deploying to prod December 24, 2024 16:33 — with GitHub Actions Failure

fistyee requested a review from liushz December 25, 2024 04:20

fistyee added 2 commits December 25, 2024 15:43

[Feature] Support MMLU-CF Benchmark

21c8a98

[Feature] Support MMLU-CF Benchmark

113b564

fistyee temporarily deployed to prod December 26, 2024 03:58 — with GitHub Actions Inactive

[Feature] Support MMLU-CF Benchmark

5044516

fistyee had a problem deploying to prod December 27, 2024 10:03 — with GitHub Actions Failure

[Feature] Support MMLU-CF Benchmark

6a57af5

fistyee temporarily deployed to prod December 27, 2024 12:01 — with GitHub Actions Inactive

fistyee and others added 2 commits December 27, 2024 20:24

[Feature] Support MMLU-CF Benchmark

706108d

Merge branch 'open-compass:main' into main

2e15038

fistyee temporarily deployed to prod December 30, 2024 03:52 — with GitHub Actions Inactive

[Feature] Support MMLU-CF Benchmark

5ab6362

fistyee added 2 commits December 30, 2024 13:37

Merge branch 'main' of https://github.com/fistyee/opencompass

4de7b20

[Feature] Support MMLU-CF Benchmark

ddd5583

liushz reviewed Dec 30, 2024

View reviewed changes

[Feature] Support MMLU-CF Benchmark

956fe45

liushz reviewed Dec 30, 2024

View reviewed changes

fistyee temporarily deployed to prod December 31, 2024 08:45 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support MMLU-CF Benchmark #1775

[Feature] Support MMLU-CF Benchmark #1775

fistyee commented Dec 24, 2024

fistyee commented Dec 28, 2024

tonysy commented Dec 30, 2024

liushz Dec 30, 2024

liushz Dec 30, 2024 •

edited

Loading

fistyee Dec 30, 2024

liushz Dec 30, 2024

liushz Dec 30, 2024

		@@ -57,6 +57,7 @@ Just like a compass guides us on our journey, OpenCompass will guide you through

		## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

		- \[2024.12.27\] We now support the Microsoft's Contamination-Free Multi-task language Understanding Benchmark [MMLU-CF](https://huggingface.co/datasets/microsoft/MMLU-CF). Feel free to give it a try! 🔥🔥🔥

		@@ -57,6 +57,7 @@

		## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

		- \[2024.12.27\] 现已支持Microsoft去污染多任务语言理解数据集[MMLU-CF](https://huggingface.co/datasets/microsoft/MMLU-CF)，欢迎尝试! 🔥🔥🔥

[Feature] Support MMLU-CF Benchmark #1775

Are you sure you want to change the base?

[Feature] Support MMLU-CF Benchmark #1775

Conversation

fistyee commented Dec 24, 2024

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

fistyee commented Dec 28, 2024

tonysy commented Dec 30, 2024

liushz Dec 30, 2024

Choose a reason for hiding this comment

liushz Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

fistyee Dec 30, 2024

Choose a reason for hiding this comment

liushz Dec 30, 2024

Choose a reason for hiding this comment

liushz Dec 30, 2024

Choose a reason for hiding this comment

liushz Dec 30, 2024 •

edited

Loading