Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmlu_flan_cot_zeroshot breaks after running the generation to 100%. bbh_cot_zeroshot breaks after loading the model. #2511

Open
HideLord opened this issue Nov 23, 2024 · 1 comment · May be fixed by #2517
Labels
bug Something isn't working.

Comments

@HideLord
Copy link

I tried running some CoT zeroshot evaluations, but they both failed. Am I doing something wrong?

Command for mmlu_flan_cot_zeroshot

accelerate launch \
    --multi_gpu \
    --num_processes 4 \
    -m lm_eval \
    --model hf \
    --apply_chat_template \
    --tasks mmlu_flan_cot_zeroshot \
    --batch_size 16 \
    --model_args pretrained=/home/hidelord/models/meta-llama_Llama-3.1-8B-Instruct

Error

Running generate_until requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 404/404 [04:54<00:00,  1.37it/s]

[rank1]:   File "<frozen runpy>", line 198, in _run_module_as_main
[rank1]:   File "<frozen runpy>", line 88, in _run_code
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/__main__.py", line 461, in <module>
[rank1]:     cli_evaluate()
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/__main__.py", line 382, in cli_evaluate
[rank1]:     results = evaluator.simple_evaluate(
[rank1]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
[rank1]:     return fn(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/evaluator.py", line 303, in simple_evaluate
[rank1]:     results = evaluate(
[rank1]:               ^^^^^^^^^
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
[rank1]:     return fn(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/evaluator.py", line 522, in evaluate
[rank1]:     task.apply_filters()
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/api/task.py", line 1135, in apply_filters
[rank1]:     f.apply(self._instances)
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/api/filter.py", line 51, in apply
[rank1]:     resps = f().apply(resps, docs)
[rank1]:             ^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/filters/extraction.py", line 48, in apply
[rank1]:     filtered_resps = list(map(lambda x: filter_set(x), resps))
[rank1]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/filters/extraction.py", line 48, in <lambda>
[rank1]:     filtered_resps = list(map(lambda x: filter_set(x), resps))
[rank1]:                                         ^^^^^^^^^^^^^
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/filters/extraction.py", line 40, in filter_set
[rank1]:     match = [m for m in match if m][0]
[rank1]:             ~~~~~~~~~~~~~~~~~~~~~~~^^^
[rank1]: IndexError: list index out of range

Command for bbh_cot_zeroshot

accelerate launch \
    --multi_gpu \
    --num_processes 4 \
    -m lm_eval \
    --model hf \
    --apply_chat_template \
    --tasks bbh_cot_zeroshot \
    --batch_size 16 \
    --model_args pretrained=/home/hidelord/models/meta-llama_Llama-3.1-8B-Instruct

Error

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.50s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.54s/it]
README.md: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.77k/6.77k [00:00<00:00, 15.2MB/s]
bbh.py: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.54k/4.54k [00:00<00:00, 24.5MB/s]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.76s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.77s/it]
2024-11-23:15:39:50,871 WARNING  [registry.py:192] filter `<class 'utils.WordSortFilter'>` is not registered!
[rank3]: Traceback (most recent call last):
[rank3]:   File "<frozen runpy>", line 198, in _run_module_as_main
[rank3]:   File "<frozen runpy>", line 88, in _run_code
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/__main__.py", line 461, in <module>
[rank3]:     cli_evaluate()
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/__main__.py", line 382, in cli_evaluate
[rank3]:     results = evaluator.simple_evaluate(
[rank3]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
[rank3]:     return fn(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/evaluator.py", line 235, in simple_evaluate
[rank3]:     task_dict = get_task_dict(tasks, task_manager)
[rank3]:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 618, in get_task_dict
[rank3]:     task_name_from_string_dict = task_manager.load_task_or_group(
[rank3]:                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 414, in load_task_or_group
[rank3]:     collections.ChainMap(*map(self._load_individual_task_or_group, task_list))
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 398, in _load_individual_task_or_group
[rank3]:     group_name: dict(collections.ChainMap(*map(fn, reversed(subtask_list))))
[rank3]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 314, in _load_individual_task_or_group
[rank3]:     return _load_task(task_config, task=name_or_config)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 280, in _load_task
[rank3]:     task_object = ConfigurableTask(config=config)
[rank3]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/api/task.py", line 834, in __init__
[rank3]:     filter_pipeline = build_filter_ensemble(filter_name, components)
[rank3]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/filters/__init__.py", line 21, in build_filter_ensemble
[rank3]:     f = partial(get_filter(function), **kwargs)
[rank3]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: TypeError: the first argument must be callable
@baberabb baberabb linked a pull request Nov 26, 2024 that will close this issue
@baberabb
Copy link
Contributor

Hi! The PR should fix the bugs!

@baberabb baberabb added the bug Something isn't working. label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants