Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

输出结果没有分数 #181

Open
9 tasks
Leo20100307 opened this issue Nov 8, 2024 · 6 comments
Open
9 tasks

输出结果没有分数 #181

Leo20100307 opened this issue Nov 8, 2024 · 6 comments
Assignees
Labels
bug Something isn't working opencompass

Comments

@Leo20100307
Copy link

问题描述 / Issue Description

请简要描述您遇到的问题。 / Please briefly describe the issue you encountered.

本地/root/ChatGLM目录下载的ChatGLM2-6B模型,

使用vllm部署server:

vllm serve /root/ChatGLM --chat-template ./examples/template_chatglm2.jinja --trust_remote_code --use-v2-block-manager

evalscope相关配置:

(evalscope) root@ubuntu:~/evalscope# cat eval_openai_api.yaml
eval_backend: OpenCompass
eval_config:
datasets:
- mmlu
- ceval
- ARC_c
- gsm8k
models:
- openai_api_base: http://127.0.0.1:8000/v1/chat/completions
path: /root/ChatGLM
temperature: 0.0

(evalscope) root@ubuntu:~/evalscope# cat example_eval_openai_api.py
from evalscope.run import run_task
from evalscope.summarizer import Summarizer

def run_eval():
# Option 1: Python dictionary
#task_cfg = task_cfg_dict

# Option 2: YAML configuration file
task_cfg = 'eval_openai_api.yaml'

# Option 3: JSON configuration file
# task_cfg = 'eval_openai_api.json'

run_task(task_cfg=task_cfg)
print('>> Start to get the report with summarizer ...')
report_list = Summarizer.get_report_from_cfg(task_cfg)
print(f'\n>> The report list: {report_list}')

run_eval()

使用的工具 / Tools Used

  • Native / 原生框架
  • [ ***] Opencompass backend
  • VLMEvalKit backend
  • RAGEval backend
  • Perf / 模型推理压测工具
  • Arena /竞技场模式

执行的代码或指令 / Code or Commands Executed

请提供您执行的主要代码或指令。 / Please provide the main code or commands you executed. 例如 / For example:

执行测试: python example_eval_openai_api.py

错误日志 / Error Log

请粘贴完整的错误日志或控制台输出。 / Please paste the full error log or console output. 例如 / For example:

dataset version metric mode /root/ChatGLM


--------- 考试 Exam --------- - - - -
ceval - - - -
cmb - - - -
agieval - - - -
mmlu - - - -
GaokaoBench - - - -
ARC-c - - - -
ARC-e - - - -
--------- 语言 Language --------- - - - -
WiC - - - -
summedits - - - -
chid-dev - - - -
afqmc-dev - - - -
bustm-dev - - - -
cluewsc-dev - - - -
WSC - - - -
winogrande - - - -
flores_100 - - - -
--------- 知识 Knowledge --------- - - - -
BoolQ - - - -
commonsense_qa - - - -
nq - - - -
triviaqa - - - -
--------- 推理 Reasoning --------- - - - -
cmnli - - - -
ocnli - - - -
ocnli_fc-dev - - - -
AX_b - - - -
AX_g - - - -
CB - - - -
RTE - - - -
story_cloze - - - -
COPA - - - -
ReCoRD - - - -
hellaswag - - - -
piqa - - - -
siqa - - - -
strategyqa - - - -
math - - - -
gsm8k - - - -
TheoremQA - - - -
openai_humaneval - - - -
mbpp - - - -
bbh - - - -
--------- 理解 Understanding --------- - - - -
C3 - - - -
CMRC_dev - - - -
DRCD_dev - - - -
MultiRC - - - -
race-middle - - - -
race-high - - - -
openbookqa_fact - - - -
csl_dev - - - -
lcsts - - - -
Xsum - - - -
eprstmt-dev - - - -
lambada - - - -
tnews-dev - - - -
11/07 07:06:42 - OpenCompass - INFO - write summary to /root/evalscope/outputs/default/20241107_070629/summary/summary_20241107_070629.txt
11/07 07:06:42 - OpenCompass - INFO - write csv to /root/evalscope/outputs/default/20241107_070629/summary/summary_20241107_070629.csv

Start to get the report with summarizer ...
2024-11-07 07:06:42,022 - evalscope - INFO - **Loading task cfg for summarizer: {'eval_backend': 'OpenCompass', 'eval_config': {'datasets': ['mmlu', 'ceval', 'ARC_c', 'gsm8k'], 'models': [{'openai_api_base': 'http://127.0.0.1:8000/v1/chat/completions', 'path': '/root/ChatGLM', 'temperature': 0.0}]}}

The report list: [{'dataset': '--------- 考试 Exam ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ceval', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'cmb', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'agieval', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'mmlu', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'GaokaoBench', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ARC-c', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ARC-e', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': '--------- 语言 Language ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'WiC', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'summedits', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'chid-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'afqmc-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'bustm-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'cluewsc-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'WSC', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'winogrande', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'flores_100', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': '--------- 知识 Knowledge ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'BoolQ', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'commonsense_qa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'nq', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'triviaqa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': '--------- 推理 Reasoning ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'cmnli', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ocnli', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ocnli_fc-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'AX_b', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'AX_g', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'CB', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'RTE', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'story_cloze', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'COPA', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'ReCoRD', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'hellaswag', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'piqa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'siqa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'strategyqa', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'math', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'gsm8k', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'TheoremQA', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'openai_humaneval', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'mbpp', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'bbh', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': '--------- 理解 Understanding ---------', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'C3', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'CMRC_dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'DRCD_dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'MultiRC', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'race-middle', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'race-high', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'openbookqa_fact', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'csl_dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'lcsts', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'Xsum', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'eprstmt-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'lambada', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}, {'dataset': 'tnews-dev', 'version': '-', 'metric': '-', 'mode': '-', '/root/ChatGLM': '-'}]

运行环境 / Runtime Environment

  • 操作系统 / Operating System:

    • Windows
    • macOS
    • [ ***] Ubuntu
  • Python版本 / Python Version:

    • [*** ] 3.11
    • 3.10
    • 3.9

其他信息 / Additional Information

如果有其他相关信息,请在此处提供。 / If there is any other relevant information, please provide it here.

@wangxingjun778
Copy link
Collaborator

请问日志中有error相关字样的log么? 如有则可以进到outputs相对应的logs文件夹中查看对应的error明细 / Please check the error log file in the outputs directory and get details of err msg.

@wangxingjun778
Copy link
Collaborator

另外请check一下,评测相关的data是否有预先准备: 参考 https://evalscope.readthedocs.io/zh-cn/latest/user_guides/backend/opencompass_backend.html

image

@Leo20100307
Copy link
Author

请问日志中有error相关字样的log么? 如有则可以进到outputs相对应的logs文件夹中查看对应的error明细 / Please check the error log file in the outputs directory and get details of err msg.

outputs目录下,有个txt文档,里面没有看到报错。日志文件80M,无法上传。

vllm端有打印,模型应该是有接收到请求并做了处理:

image

@Leo20100307
Copy link
Author

另外请check一下,评测相关的data是否有预先准备: 参考 https://evalscope.readthedocs.io/zh-cn/latest/user_guides/backend/opencompass_backend.html

image

数据文件已经下载,,并解压到当前目录下,目录名称"data"

@Leo20100307
Copy link
Author

image

@Leo20100307
Copy link
Author

image

data目录下的数据集文件

@Yunnglin Yunnglin added bug Something isn't working opencompass labels Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working opencompass
Projects
None yet
Development

No branches or pull requests

3 participants