English | 简体中文
📜 论文 | Github | 🤗 演示(Huggingface) 🤗 模型权重(Huggingface)
TRivia是一个新颖的自监督表格识别VLM的微调框架。我们在这个仓库中发布了TRivia-3B。TRivia-3B是一个基于Qwen2.5-VL-3B,使用TRivia框架进行微调的先进表格识别VLM,并在多个真实世界的表格识别基准上展现出强大的性能。
- ⭐ 强大的表格识别能力,TRivia-3B不仅适用于电子、扫描和拍照等等表格,而且能自动分辨表格图片中的背景与主体,仅识别表格主体部分。
- 📃 可复现的训练管线,仅使用无标签数据且无需蒸馏即可推动表格识别能力的提升。
我们主要在下面三个真实世界基准上进行评测: OmnidocBench v1.5, CC-OCR and OCRBench v2
| PubTabNet | OmniDocBench | CC-OCR | OCRBench | Overall | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| TEDS | S-TEDS | TEDS | S-TEDS | TEDS | S-TEDS | TEDS | S-TEDS | TEDS | S-TEDS | |
| Expert TR models | ||||||||||
| SLANNet-plus | 86.57 | 96.43 | 81.90 | 89.08 | 50.93 | 65.84 | 65.55 | 77.73 | 68.19 | 79.21 |
| UniTable | 86.44 | 95.66 | 82.76 | 89.82 | 57.84 | 70.47 | 67.73 | 78.65 | 70.86 | 80.81 |
| General-purpose VLMs | ||||||||||
| InternVL3.5-241B-A30B | 83.75 | 88.76 | 86.03 | 90.53 | 62.87 | 69.52 | 79.50 | 85.81 | 78.41 | 84.18 |
| Qwen2.5-VL-72B | 84.39 | 87.91 | 87.85 | 91.80 | 81.22 | 86.48 | 81.33 | 86.58 | 83.52 | 88.33 |
| Qwen3-VL-235B-A22B | - | - | 91.02 | 94.97 | 80.98 | 86.19 | 84.12 | 88.15 | 85.83 | 90.07 |
| Gemini 2.5 Pro | - | - | 90.90 | 94.32 | 85.56 | 90.07 | 88.94 | 89.47 | 88.93 | 91.23 |
| GPT-4o | 76.53 | 86.16 | 78.27 | 84.56 | 66.98 | 79.04 | 70.51 | 79.55 | 72.44 | 81.15 |
| GPT-5 | - | - | 84.91 | 89.91 | 63.25 | 74.09 | 79.91 | 88.69 | 78.30 | 86.21 |
| Document-parsing VLMs | ||||||||||
| dots.ocr | 90.65 | 93.76 | 88.62 | 92.86 | 75.42 | 81.65 | 82.04 | 86.27 | 82.95 | 87.58 |
| DeepSeek-OCR | - | - | 83.79 | 87.86 | 68.95 | 75.22 | 82.64 | 87.33 | 80.31 | 85.11 |
| PaddleOCR-VL | - | - | 91.12 | 94.62 | 79.62 | 85.04 | 79.29 | 83.93 | 83.36 | 87.77 |
| MinerU2.5 | 89.07 | 93.11 | 90.85 | 94.68 | 79.76 | 85.16 | 87.13 | 90.62 | 86.82 | 90.81 |
| TRivia-3B(Ours) | 91.79 | 93.81 | 91.60 | 95.01 | 84.90 | 90.17 | 90.76 | 94.03 | 89.88 | 93.60 |
因为TRivia-3B是基于Qwen2.5-VL-3B进行训练,因此你可以参考Qwen2.5-VL-3B installation guide 进行环境配置。
我们强烈推荐安装vLLM >= 0.7.2来提高推理速度.
TRivia-3B以表格图像作为输入并输出OTSL标记作为输出。
注意:TRivia-3B 是一个实验性的模型,没有经过严格的工程优化且无法输出LaTex公式或者以及表中有图片的场景。
确保已经安装 vllm >= 0.7.2. 将待识别的图片放到目录下并运行以下命令:
python run_vllm_offline_inf.py --ckpt_root opendatalab/TRivia-3B --image_root /path/to/images --output_path ./vllm_offline_output.json
# Examples
python run_vllm_offline_inf.py --ckpt_root opendatalab/TRivia-3B --image_root ./examples --output_path ./examples_output.json输出是一个JSON文件(example),格式如下:
[
{
"path": "...", // Image path
"otsl": "...", // Unprocessed OTSL tags output by the model
"html": "...", // Converted HTML tags
}
]你也可以使用vLLM或者SGLang部署TRivia-3B,并使用openai样式的api进行请求访问。
- 启动服务
vllm serve opendatalab/TRivia-3B --port 10000 --gpu_memory_utilization 0.8 - Table Image Request
import base64
from openai import OpenAI
from otsl_utils import convert_otsl_to_html
client = OpenAI(
api_key="EMPTY",
base_url="http://127.0.0.1:10000/v1",
timeout=3600
)
image_path = "./examples/docstructbench_llm-raw-scihub-o.O-ijc.22994.pdf_3_5.png"
with open(path, "rb") as image_file:
base64_image = base64.b64encode(image_file.read()).decode('utf-8')
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "You are an AI specialized in recognizing and extracting table from images. Your mission is to analyze the table image and generate the result in OTSL format using specified tags. Output only the results without any other words and explanation." # Make sure to use this prompt for optimal performance.
},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
}
]
}
]
response = client.chat.completions.create(
model="opendatalab/TRivia-3B",
messages=messages,
temperature=0.0,
max_tokens=8192
)
otsl_content = response.choices[0].message.content
html_content = convert_otsl_to_html(otsl_content)
print(f"Generated otsl tags: {otsl_content}")
print(f"HTML table: {html_content}")@misc{zhang2025triviaselfsupervisedfinetuningvisionlanguage,
title={TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition},
author={Junyuan Zhang and Bin Wang and Qintong Zhang and Fan Wu and Zichen Wen and Jialin Lu and Junjie Shan and Ziqi Zhao and Shuya Yang and Ziling Wang and Ziyang Miao and Huaping Zhong and Yuhang Zang and Xiaoyi Dong and Ka-Ho Chow and Conghui He},
year={2025},
eprint={2512.01248},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.01248},
}

