Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support MMLU-CF Benchmark #1775

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
d1a9db5
[Feature] Support MMLU-CF Benchmark
fistyee Dec 24, 2024
0c48407
[Feature] Support MMLU-CF Benchmark
fistyee Dec 24, 2024
17af07e
[Feature] Support MMLU-CF Benchmark
fistyee Dec 24, 2024
772b9a7
[Feature] Support MMLU-CF Benchmark
fistyee Dec 24, 2024
a32a1ee
[Feature] Support MMLU-CF Benchmark
fistyee Dec 24, 2024
531945d
[Feature] Support MMLU-CF Benchmark
fistyee Dec 24, 2024
3d769a9
[Feature] Support MMLU-CF Benchmark
fistyee Dec 24, 2024
21c8a98
[Feature] Support MMLU-CF Benchmark
fistyee Dec 25, 2024
113b564
[Feature] Support MMLU-CF Benchmark
fistyee Dec 25, 2024
5044516
[Feature] Support MMLU-CF Benchmark
fistyee Dec 26, 2024
6a57af5
[Feature] Support MMLU-CF Benchmark
fistyee Dec 27, 2024
706108d
[Feature] Support MMLU-CF Benchmark
fistyee Dec 27, 2024
2e15038
Merge branch 'open-compass:main' into main
fistyee Dec 27, 2024
5ab6362
[Feature] Support MMLU-CF Benchmark
fistyee Dec 30, 2024
4de7b20
Merge branch 'main' of https://github.com/fistyee/opencompass
fistyee Dec 30, 2024
ddd5583
[Feature] Support MMLU-CF Benchmark
fistyee Dec 30, 2024
956fe45
[Feature] Support MMLU-CF Benchmark
fistyee Dec 30, 2024
a222713
[Feature] Support MMLU-CF Benchmark
fistyee Jan 8, 2025
d5f756e
[Feature] Support MMLU-CF Benchmark
fistyee Jan 8, 2025
2329a5f
[Feature] Support MMLU-CF Benchmark
fistyee Jan 8, 2025
93c4411
[Feature] Support MMLU-CF Benchmark
fistyee Jan 8, 2025
77df499
Update mmlu-cf
liushz Jan 8, 2025
e428a7e
Update mmlu-cf
liushz Jan 8, 2025
ce3ee2d
Update mmlu-cf
liushz Jan 8, 2025
d061100
[Feature] Support MMLU-CF Benchmark
fistyee Jan 8, 2025
c5722b9
[Feature] Support MMLU-CF Benchmark
fistyee Jan 8, 2025
8439245
[Feature] Support MMLU-CF Benchmark
fistyee Jan 9, 2025
89929df
Remove outside configs
liushz Jan 9, 2025
e7149dd
Remove outside configs
liushz Jan 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ Just like a compass guides us on our journey, OpenCompass will guide you through

## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

- **\[2024.12.24\]** We now support the Microsoft's Contamination-Free Multi-task language Understanding Benchmark [MMLU-CF](https://huggingface.co/datasets/microsoft/MMLU-CF). Feel free to give it a try! 🔥🔥🔥
- **\[2024.12.17\]** We have provided the evaluation script for the December [CompassAcademic](configs/eval_academic_leaderboard_202412.py), which allows users to easily reproduce the official evaluation results by configuring it.
- **\[2024.11.14\]** OpenCompass now offers support for a sophisticated benchmark designed to evaluate complex reasoning skills — [MuSR](https://arxiv.org/pdf/2310.16049). Check out the [demo](configs/eval_musr.py) and give it a spin! 🔥🔥🔥
- **\[2024.11.14\]** OpenCompass now supports the brand new long-context language model evaluation benchmark — [BABILong](https://arxiv.org/pdf/2406.10149). Have a look at the [demo](configs/eval_babilong.py) and give it a try! 🔥🔥🔥
Expand Down
1 change: 1 addition & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@

## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>

- **\[2024.12.24\]** 现已支持Microsoft去污染多任务语言理解数据集[MMLU-CF](https://huggingface.co/datasets/microsoft/MMLU-CF),欢迎尝试! 🔥🔥🔥
- **\[2024.12.17\]** 我们提供了12月CompassAcademic学术榜单评估脚本 [CompassAcademic](configs/eval_academic_leaderboard_202412.py),你可以通过简单地配置复现官方评测结果。
- **\[2024.10.14\]** 现已支持OpenAI多语言问答数据集[MMMLU](https://huggingface.co/datasets/openai/MMMLU),欢迎尝试! 🔥🔥🔥
- **\[2024.09.19\]** 现已支持[Qwen2.5](https://huggingface.co/Qwen)(0.5B to 72B) ,可以使用多种推理后端(huggingface/vllm/lmdeploy), 欢迎尝试! 🔥🔥🔥
Expand Down
1 change: 1 addition & 0 deletions configs/dataset_collections/chat_OC15.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

with read_base():
from opencompass.configs.datasets.mmlu.mmlu_gen_4d595a import mmlu_datasets
from opencompass.configs.datasets.mmlu_cf.mmlu_cf_gen import mmlu_cf_datasets
from opencompass.configs.datasets.cmmlu.cmmlu_gen_c13365 import cmmlu_datasets
from opencompass.configs.datasets.ceval.ceval_gen_5f30c7 import ceval_datasets
from opencompass.configs.datasets.GaokaoBench.GaokaoBench_no_subjective_gen_4c31db import GaokaoBench_datasets
Expand Down
Loading
Loading