88
99| 数据集 | Hugging Face 链接 |
1010| ------------ | ------------------------------------------------------------ |
11- | AIME2025 | [ opencompass/AIME2025 · Datasets at Hugging Face] ( https://huggingface.co/datasets/opencompass/AIME2025 ) |
1211| LongBench | [ zai-org/LongBench · Datasets at Hugging Face] ( https://huggingface.co/datasets/zai-org/LongBench ) |
1312| LongBench v2 | [ zai-org/LongBench-v2 · Datasets at Hugging Face] ( https://huggingface.co/datasets/zai-org/LongBench-v2 ) |
1413
1918| ShartGPT | [ anon8231489123/ShareGPT_Vicuna_unfiltered · Datasets at Hugging Face] ( https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered ) |
2019| ShartGPT-Chinese-English-90K | [ shareAI/ShareGPT-Chinese-English-90k · Datasets at Hugging Face] ( https://huggingface.co/datasets/shareAI/ShareGPT-Chinese-English-90k ) |
2120
22- - 多轮对话数据集格式参照如下:
21+ 多轮对话数据集格式可参照如下两种形式:
22+
23+ - 格式1:
24+ - 顶层键名(如 ` "sharegpt" ` )可以自定义,但内部结构必须保持一致
25+ - ` "conversations" ` 字段名不可修改
26+ - 对话必须采用 ` "from" ` 和 ` "value" ` 格式
2327
2428``` json
2529{
3842}]}
3943```
4044
41- ** 注意 ** :
45+ - 格式2 :
4246
43- - 顶层键名(如 ` "sharegpt" ` )可以自定义,但内部结构必须保持一致
44- - ` "conversations" ` 字段名不可修改
45- - 对话必须采用 ` "from" ` 和 ` "value" ` 格式
47+ ``` json
48+ [
49+ {
50+ "id" : " dsOTKpn_0" ,
51+ "conversations" : [
52+ {
53+ "from" : " human" ,
54+ "value" : " Why does `dir` command in DOS see the \" <.<\" argument as \"\\ *.\\ *\" ?"
55+ },
56+ {
57+ "from" : " human" ,
58+ "value" : " I said `dir \" <.<\" ` , it only has one dot but it is the same as `dir \"\\ *.\\ *\" `"
59+ }
60+ ]
61+ },
62+ {
63+ "id" : " 60493" ,
64+ "conversations" : [
65+ {
66+ "from" : " human" ,
67+ "value" : " 我想用TypeScript编写一个程序,提供辅助函数以生成G代码绘图(Marlin)。我已经在我的3D打印机上添加了笔座,并希望将其用作笔绘图仪。该库应提供类似使用p5.js的体验,但它不是在画布上绘制形状,而是在G代码中产生文本输出。"
68+ }
69+ ],
70+ "lang" : " en"
71+ }
72+ ]
73+ ```
4674
4775### stopwords文件
4876
@@ -232,15 +260,15 @@ def test_multiturn_dialogue_perf(
232260 "demo": [
233261 "demo.json"
234262 ],
235- "sharrgpt": [
236-
237- ]
263+ "sharegpt": [
264+ "demo.json"
265+ ]
238266}
239267` ` `
240268
241269- 说明:
242270 - 键名(如 `"demo"`)表示数据集文件夹名称
243- - 值列表包含该文件夹下的数据文件名称
271+ - 值列表表示该文件夹下的数据文件名称
244272
245273# ## 文档问答性能测试
246274
@@ -309,7 +337,7 @@ models:
309337python -m pytest --feature=qa_eval_test
310338` ` `
311339
312- - **结果保存位置**:所有性能测试数据保存在:`uc_eval/results/reports/evaluate/doc_qa_latency.xlsx`
340+ - **结果保存位置**:所有性能测试数据保存在:`uc_eval/results/reports/evaluate/doc_qa_latency.xlsx`,同时,在evaluate目录下会生成一个以日期命名的文件夹,其中包含数据集和模型回复等信息
313341- **参数配置说明**:
314342
315343| 参数 | 含义 | 示例值 |
@@ -339,7 +367,7 @@ doc_qa_eval_cases = [
339367 metrics=["accuracy", "bootstrap-accuracy", "f1-score"],
340368 eval_class="common.uc_eval.utils.metric:MatchPatterns",
341369 select_data_class={"domain": ["Single-Document QA"]},
342- test_name="longbench and no prefix cache"
370+ test_name="longbench v2 and no prefix cache"
343371 ),
344372 ),
345373 # longbench参考配置
@@ -350,9 +378,9 @@ doc_qa_eval_cases = [
350378 enable_prefix_cache=False,
351379 parallel_num=1,
352380 benchmark_mode="evaluate",
353- metrics=["accuracy", "bootstrap-accuracy", " f1-score"],
381+ metrics=["f1-score"],
354382 eval_class="common.uc_eval.utils.metric:FuzzyMatch",
355- test_name="longbench v2 and no prefix cache"
383+ test_name="longbench and no prefix cache"
356384 ),
357385 ),
358386]
@@ -385,31 +413,44 @@ def test_doc_qa_perf(
385413 - **模板文件**:test/common/uc_eval/utils/prompt_config.py
386414
387415` ` ` python
388- # 非多项选择题提示模板
389- doc_qa_prompt = ["""
390- Please read the following text and answer the questions below.\n
391- Text: {context}\n
392- Question: {input}
393- Instructions: Answer based ONLY on the information in the text above
394- """]
416+ # 文档问答数据集的语言,决定后续的分词方式,以及后续prompt具体使用中文还是英文. 具体使用时首先会读取数据集中是否存在language这个键,如果不存在才使用该配置
417+ # 可选值包含三个: en, zh, None
418+ DEFAULT_LANGUAGE = "None"
419+
420+ # 文档问答提示模板,在使用时会将{}占位符替换为数据集中键值对应的内容,包含英文prompt和中文prompt两种形式
421+ Q&A prompt for document QA – replace the {} placeholders with actual content from the dataset when used.
422+ doc_qa_prompt_zh = [
423+ """
424+ 阅读以下文字并用中文简短回答:\n\n {context}\n\n 现在请基于上面的文章回答下面的问题,只告诉我答案,不要输出任何其他字词。\n\n 问题:{input}\n 回答:
425+ """
426+ ]
427+
428+ doc_qa_prompt_en = [
429+ """
430+ Read the following text and answer briefly.\n\n {context}\n\n Now, answer the following question based on the above text, only give me the answer and do not output any other words.\n\n Question: {input}\n Answer:
431+ """
432+ ]
395433
396434# 多项选择题提示模板
397- multi_answer_prompt = ["""
435+ multi_answer_prompt = [
436+ """
398437 Please read the following text and answer the questions below.\n
399438 Text: {context}\n
400439 What is the correct answer to this question: {question}\n
401440 Choices: \n (A) {choice_A} \n (B) {choice_B} \n (C) {choice_C} \n (D) {choice_D} \n
402441 Let's think step by step. Based on the above, what is the single, most likely answer choice?\n
403442 Format your response as follows: "The correct answer is (insert answer here)'
404- """]
443+ """
444+ ]
405445
406446# 答案提取正则表达式模板
407447match_patterns = [
408- r' The correct answer is \( ([A-D])\) ' ,
409- r' The correct answer is ([A-D])' ,
410- r' The \( ([A-D])\) is the correct answer' ,
411- r' The ([A-D]) is the correct answer'
448+ r" The correct answer is \( ([A-D])\) " ,
449+ r" The correct answer is ([A-D])" ,
450+ r" The \( ([A-D])\) is the correct answer" ,
451+ r" The ([A-D]) is the correct answer",
412452]
453+
413454` ` `
414455
415456- **prompt_config模板使用说明**:
@@ -421,4 +462,4 @@ match_patterns = [
421462 - 使用 `multi_answer_prompt` 中的模板构造提示
422463 - 发送请求获取模型回复
423464 - 使用 `match_patterns` 中的正则表达式提取答案(A/B/C/D)
424- - 与数据集的参考答案进行比对,获取精度
465+ - 与数据集的参考答案进行比对,获取精度或者F1-score
0 commit comments