Intro

This repo is used for @smartliuhw thesis's model evaluation. The EleutherAI/lm-evaluation-harness is used as the basic framework.

The code was running on the RTX 4090 with 24G GPU-memory with accelarate package to enable data parallel.

How to use

Install the dependency

Enter the path which contanins the README file, then run the following command:

pip install -e .

Modify task configurations

All the task configurations are in this path. Enter it and modify the task's configuration you need.

An example is in nq_open_cot.yaml file, in which I customized the dataset path, task group, descriptions, input template and metrics. Also, the utils.py is modified to adapt to the special dataset. It is recommanded to save the dataset locally to save precious time.

Modify evaluation script

After customized the task you need, a shell script is needed to launch the evaluation. An example is in eval_test.sh file. Only a few params are needed to be changed.

If you have any question, feel free to ask me. And it's recommanded to read the framework's origin README to gain a better understanding about this framework.

Name		Name	Last commit message	Last commit date
Latest commit History 3,347 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
lm_eval		lm_eval
metrics		metrics
scripts		scripts
templates/new_yaml_task		templates/new_yaml_task
tests		tests
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.bib		CITATION.bib
CODEOWNERS		CODEOWNERS
LICENSE.md		LICENSE.md
README.md		README.md
README_Repo.md		README_Repo.md
eval_cot.sh		eval_cot.sh
eval_naive_search_rag.sh		eval_naive_search_rag.sh
eval_rag.log		eval_rag.log
eval_rag.sh		eval_rag.sh
eval_test.sh		eval_test.sh
ignore.txt		ignore.txt
mypy.ini		mypy.ini
pile_statistics.json		pile_statistics.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
task_list.txt		task_list.txt
tmp.log		tmp.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intro

How to use

Install the dependency

Modify task configurations

Modify evaluation script

About

Releases

Packages

Languages

License

smartliuhw/eval

Folders and files

Latest commit

History

Repository files navigation

Intro

How to use

Install the dependency

Modify task configurations

Modify evaluation script

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages