mmlu_flan_cot_zeroshot breaks after running the generation to 100%. bbh_cot_zeroshot breaks after loading the model. #2511

HideLord · 2024-11-23T13:42:28Z

I tried running some CoT zeroshot evaluations, but they both failed. Am I doing something wrong?

Command for mmlu_flan_cot_zeroshot

accelerate launch \
    --multi_gpu \
    --num_processes 4 \
    -m lm_eval \
    --model hf \
    --apply_chat_template \
    --tasks mmlu_flan_cot_zeroshot \
    --batch_size 16 \
    --model_args pretrained=/home/hidelord/models/meta-llama_Llama-3.1-8B-Instruct

Error

Running generate_until requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 404/404 [04:54<00:00,  1.37it/s]

[rank1]:   File "<frozen runpy>", line 198, in _run_module_as_main
[rank1]:   File "<frozen runpy>", line 88, in _run_code
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/__main__.py", line 461, in <module>
[rank1]:     cli_evaluate()
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/__main__.py", line 382, in cli_evaluate
[rank1]:     results = evaluator.simple_evaluate(
[rank1]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
[rank1]:     return fn(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/evaluator.py", line 303, in simple_evaluate
[rank1]:     results = evaluate(
[rank1]:               ^^^^^^^^^
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
[rank1]:     return fn(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/evaluator.py", line 522, in evaluate
[rank1]:     task.apply_filters()
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/api/task.py", line 1135, in apply_filters
[rank1]:     f.apply(self._instances)
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/api/filter.py", line 51, in apply
[rank1]:     resps = f().apply(resps, docs)
[rank1]:             ^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/filters/extraction.py", line 48, in apply
[rank1]:     filtered_resps = list(map(lambda x: filter_set(x), resps))
[rank1]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/filters/extraction.py", line 48, in <lambda>
[rank1]:     filtered_resps = list(map(lambda x: filter_set(x), resps))
[rank1]:                                         ^^^^^^^^^^^^^
[rank1]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/filters/extraction.py", line 40, in filter_set
[rank1]:     match = [m for m in match if m][0]
[rank1]:             ~~~~~~~~~~~~~~~~~~~~~~~^^^
[rank1]: IndexError: list index out of range

Command for bbh_cot_zeroshot

accelerate launch \
    --multi_gpu \
    --num_processes 4 \
    -m lm_eval \
    --model hf \
    --apply_chat_template \
    --tasks bbh_cot_zeroshot \
    --batch_size 16 \
    --model_args pretrained=/home/hidelord/models/meta-llama_Llama-3.1-8B-Instruct

Error

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.50s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.54s/it]
README.md: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.77k/6.77k [00:00<00:00, 15.2MB/s]
bbh.py: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.54k/4.54k [00:00<00:00, 24.5MB/s]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.76s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.77s/it]
2024-11-23:15:39:50,871 WARNING  [registry.py:192] filter `<class 'utils.WordSortFilter'>` is not registered!
[rank3]: Traceback (most recent call last):
[rank3]:   File "<frozen runpy>", line 198, in _run_module_as_main
[rank3]:   File "<frozen runpy>", line 88, in _run_code
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/__main__.py", line 461, in <module>
[rank3]:     cli_evaluate()
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/__main__.py", line 382, in cli_evaluate
[rank3]:     results = evaluator.simple_evaluate(
[rank3]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
[rank3]:     return fn(*args, **kwargs)
[rank3]:            ^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/evaluator.py", line 235, in simple_evaluate
[rank3]:     task_dict = get_task_dict(tasks, task_manager)
[rank3]:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 618, in get_task_dict
[rank3]:     task_name_from_string_dict = task_manager.load_task_or_group(
[rank3]:                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 414, in load_task_or_group
[rank3]:     collections.ChainMap(*map(self._load_individual_task_or_group, task_list))
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 398, in _load_individual_task_or_group
[rank3]:     group_name: dict(collections.ChainMap(*map(fn, reversed(subtask_list))))
[rank3]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 314, in _load_individual_task_or_group
[rank3]:     return _load_task(task_config, task=name_or_config)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 280, in _load_task
[rank3]:     task_object = ConfigurableTask(config=config)
[rank3]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/api/task.py", line 834, in __init__
[rank3]:     filter_pipeline = build_filter_ensemble(filter_name, components)
[rank3]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/hidelord/lm-evaluation-harness/lm_eval/filters/__init__.py", line 21, in build_filter_ensemble
[rank3]:     f = partial(get_filter(function), **kwargs)
[rank3]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: TypeError: the first argument must be callable

The text was updated successfully, but these errors were encountered:

baberabb · 2024-11-26T09:08:01Z

Hi! The PR should fix the bugs!

baberabb linked a pull request Nov 26, 2024 that will close this issue

Filters bugfix #2517

Open

baberabb added the bug Something isn't working. label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mmlu_flan_cot_zeroshot breaks after running the generation to 100%. bbh_cot_zeroshot breaks after loading the model. #2511

mmlu_flan_cot_zeroshot breaks after running the generation to 100%. bbh_cot_zeroshot breaks after loading the model. #2511

HideLord commented Nov 23, 2024

baberabb commented Nov 26, 2024

mmlu_flan_cot_zeroshot breaks after running the generation to 100%. bbh_cot_zeroshot breaks after loading the model. #2511

mmlu_flan_cot_zeroshot breaks after running the generation to 100%. bbh_cot_zeroshot breaks after loading the model. #2511

Comments

HideLord commented Nov 23, 2024

Command for mmlu_flan_cot_zeroshot

Error

Command for bbh_cot_zeroshot

Error

baberabb commented Nov 26, 2024