CompassBench

CompassBench is the self-built benchmark for the CompassRank LLM Leaderboard, we will provide the example data for the benchmark.

202407 V1.3

Please check CompassBench for more information of the benchmark.

v1_3_data
    ├── code
    │   ├── compass_bench_coding_cn_val.json
    │   └── compass_bench_coding_en_val.json
    ├── instruct
    │   ├── compass_bench_instruct_cn_val.json
    │   └── compass_bench_instruct_en_val.json
    ├── knowledge
    │   └── single_choice_cn.jsonl
    ├── language
    │   ├── compass_bench_language_cn_val.json
    │   └── compass_bench_language_en_val.json
    ├── math
    │   └── single_choice_cn.jsonl
    └── reasoning
        ├── compass_bench_reasoning_cn_val.json
        └── compass_bench_reasoning_en_val.json

For subjective evaluation, please refer to CompassBench Subjective Config
For objective evaluation, please refer to CompassBench Objective Config

Performance of the example data will be updated soon.

Evaluation

Please link the v1_3_data to data/compassbench_v1_3 within the opencompass directory

export HUGGINGFACE_HUB_CACHE=/path-to-hf_hub/
export HF_HUB_CACHE=/path-to-hf_hub/
export HF_EVALUATE_OFFLINE=1
export HF_DATASETS_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
# Objective Evaluation
# We use `perf_4` as the final metric
python run.py --models hf_internlm2_chat_1_8b --datasets compassbench_v1_3_objective_gen

# Subjective Evaluation
python run.py configs/eval_compassbench_v1_3_subjective.py

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
v1_3_data		v1_3_data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CompassBench

202407 V1.3

Evaluation

About

Releases

Packages

Contributors 3

open-compass/CompassBench

Folders and files

Latest commit

History

Repository files navigation

CompassBench

202407 V1.3

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages