输出结果没有分数 #181

Leo20100307 · 2024-11-08T00:59:33Z

问题描述 / Issue Description

请简要描述您遇到的问题。 / Please briefly describe the issue you encountered.

本地/root/ChatGLM目录下载的ChatGLM2-6B模型，

使用vllm部署server：

vllm serve /root/ChatGLM --chat-template ./examples/template_chatglm2.jinja --trust_remote_code --use-v2-block-manager

evalscope相关配置：

(evalscope) root@ubuntu:~/evalscope# cat eval_openai_api.yaml
eval_backend: OpenCompass
eval_config:
datasets:
- mmlu
- ceval
- ARC_c
- gsm8k
models:
- openai_api_base: http://127.0.0.1:8000/v1/chat/completions
path: /root/ChatGLM
temperature: 0.0

(evalscope) root@ubuntu:~/evalscope# cat example_eval_openai_api.py
from evalscope.run import run_task
from evalscope.summarizer import Summarizer

def run_eval():
# Option 1: Python dictionary
#task_cfg = task_cfg_dict

# Option 2: YAML configuration file
task_cfg = 'eval_openai_api.yaml'

# Option 3: JSON configuration file
# task_cfg = 'eval_openai_api.json'

run_task(task_cfg=task_cfg)
print('>> Start to get the report with summarizer ...')
report_list = Summarizer.get_report_from_cfg(task_cfg)
print(f'\n>> The report list: {report_list}')

run_eval()

使用的工具 / Tools Used

执行的代码或指令 / Code or Commands Executed

请提供您执行的主要代码或指令。 / Please provide the main code or commands you executed. 例如 / For example:

执行测试： python example_eval_openai_api.py

错误日志 / Error Log

请粘贴完整的错误日志或控制台输出。 / Please paste the full error log or console output. 例如 / For example:

dataset version metric mode /root/ChatGLM

--------- 考试 Exam --------- - - - -
ceval - - - -
cmb - - - -
agieval - - - -
mmlu - - - -
GaokaoBench - - - -
ARC-c - - - -
ARC-e - - - -
--------- 语言 Language --------- - - - -
WiC - - - -
summedits - - - -
chid-dev - - - -
afqmc-dev - - - -
bustm-dev - - - -
cluewsc-dev - - - -
WSC - - - -
winogrande - - - -
flores_100 - - - -
--------- 知识 Knowledge --------- - - - -
BoolQ - - - -
commonsense_qa - - - -
nq - - - -
triviaqa - - - -
--------- 推理 Reasoning --------- - - - -
cmnli - - - -
ocnli - - - -
ocnli_fc-dev - - - -
AX_b - - - -
AX_g - - - -
CB - - - -
RTE - - - -
story_cloze - - - -
COPA - - - -
ReCoRD - - - -
hellaswag - - - -
piqa - - - -
siqa - - - -
strategyqa - - - -
math - - - -
gsm8k - - - -
TheoremQA - - - -
openai_humaneval - - - -
mbpp - - - -
bbh - - - -
--------- 理解 Understanding --------- - - - -
C3 - - - -
CMRC_dev - - - -
DRCD_dev - - - -
MultiRC - - - -
race-middle - - - -
race-high - - - -
openbookqa_fact - - - -
csl_dev - - - -
lcsts - - - -
Xsum - - - -
eprstmt-dev - - - -
lambada - - - -
tnews-dev - - - -
11/07 07:06:42 - OpenCompass - INFO - write summary to /root/evalscope/outputs/default/20241107_070629/summary/summary_20241107_070629.txt
11/07 07:06:42 - OpenCompass - INFO - write csv to /root/evalscope/outputs/default/20241107_070629/summary/summary_20241107_070629.csv

Start to get the report with summarizer ...
2024-11-07 07:06:42,022 - evalscope - INFO - **Loading task cfg for summarizer: {'eval_backend': 'OpenCompass', 'eval_config': {'datasets': ['mmlu', 'ceval', 'ARC_c', 'gsm8k'], 'models': [{'openai_api_base': 'http://127.0.0.1:8000/v1/chat/completions', 'path': '/root/ChatGLM', 'temperature': 0.0}]}}

The report list: [{'dataset': '--------- 考试 Exam ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ceval', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'cmb', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'agieval', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'mmlu', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'GaokaoBench', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ARC-c', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ARC-e', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': '--------- 语言 Language ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'WiC', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'summedits', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'chid-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'afqmc-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'bustm-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'cluewsc-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'WSC', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'winogrande', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'flores_100', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': '--------- 知识 Knowledge ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'BoolQ', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'commonsense_qa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'nq', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'triviaqa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': '--------- 推理 Reasoning ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'cmnli', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ocnli', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ocnli_fc-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'AX_b', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'AX_g', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'CB', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'RTE', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'story_cloze', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'COPA', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ReCoRD', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'hellaswag', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'piqa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'siqa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'strategyqa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'math', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'gsm8k', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'TheoremQA', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'openai_humaneval', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'mbpp', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'bbh', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': '--------- 理解 Understanding ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'C3', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'CMRC_dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'DRCD_dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'MultiRC', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'race-middle', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'race-high', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'openbookqa_fact', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'csl_dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'lcsts', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'Xsum', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'eprstmt-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'lambada', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'tnews-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}]

运行环境 / Runtime Environment

操作系统 / Operating System:
- Windows
- macOS
- [ ***] Ubuntu
Python版本 / Python Version:
- [*** ] 3.11
- 3.10
- 3.9

其他信息 / Additional Information

如果有其他相关信息，请在此处提供。 / If there is any other relevant information, please provide it here.

The text was updated successfully, but these errors were encountered:

wangxingjun778 · 2024-11-08T02:57:52Z

请问日志中有error相关字样的log么？如有则可以进到outputs相对应的logs文件夹中查看对应的error明细 / Please check the error log file in the outputs directory and get details of err msg.

wangxingjun778 · 2024-11-08T03:01:12Z

另外请check一下，评测相关的data是否有预先准备：参考 https://evalscope.readthedocs.io/zh-cn/latest/user_guides/backend/opencompass_backend.html

Leo20100307 · 2024-11-08T07:39:22Z

请问日志中有error相关字样的log么？如有则可以进到outputs相对应的logs文件夹中查看对应的error明细 / Please check the error log file in the outputs directory and get details of err msg.

outputs目录下，有个txt文档，里面没有看到报错。日志文件80M，无法上传。

vllm端有打印，模型应该是有接收到请求并做了处理：

Leo20100307 · 2024-11-08T07:40:58Z

另外请check一下，评测相关的data是否有预先准备：参考 https://evalscope.readthedocs.io/zh-cn/latest/user_guides/backend/opencompass_backend.html

数据文件已经下载，，并解压到当前目录下，目录名称"data"

Leo20100307 · 2024-11-08T07:41:16Z

Leo20100307 · 2024-11-08T07:46:20Z

data目录下的数据集文件

Yunnglin added bug Something isn't working opencompass labels Nov 26, 2024

Yunnglin assigned wangxingjun778 Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

输出结果没有分数 #181

输出结果没有分数 #181

Leo20100307 commented Nov 8, 2024

wangxingjun778 commented Nov 8, 2024

wangxingjun778 commented Nov 8, 2024

Leo20100307 commented Nov 8, 2024

Leo20100307 commented Nov 8, 2024

Leo20100307 commented Nov 8, 2024

Leo20100307 commented Nov 8, 2024

输出结果没有分数 #181

输出结果没有分数 #181

Comments

Leo20100307 commented Nov 8, 2024

问题描述 / Issue Description

使用的工具 / Tools Used

执行的代码或指令 / Code or Commands Executed

错误日志 / Error Log

运行环境 / Runtime Environment

其他信息 / Additional Information

wangxingjun778 commented Nov 8, 2024

wangxingjun778 commented Nov 8, 2024

Leo20100307 commented Nov 8, 2024

Leo20100307 commented Nov 8, 2024

Leo20100307 commented Nov 8, 2024

Leo20100307 commented Nov 8, 2024