MiniMonkey 是基于 InternVL2 的专用于OCR文档理解的多模态大模型。
完成环境准备后,我们目前提供单轮对话方式使用:
python paddlemix/examples/minimonkey/chat_demo_minimonkey.py \
--model_name_or_path "HUST-VLRLab/Mini-Monkey" \
--image_path 'path/to/image.jpg' \
--text "Read the all text in the image."
可配置参数说明:
model_name_or_path
: 指定 minimonkey 的模型名字或权重路径以及tokenizer组件,默认 HUST-VLRLab/Mini-Monkeyimage_path
: 指定图片路径text
: 用户指令, 例如 "Read the all text in the image."
sh paddlemix/examples/minimonkey/shell/internvl2.0/2nd_finetune/minimonkey_2b_internlm2_1_8b_dynamic_res_2nd_finetune_full.sh
@article{huang2024mini,
title={Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models},
author={Huang, Mingxin and Liu, Yuliang and Liang, Dingkang and Jin, Lianwen and Bai, Xiang},
journal={arXiv preprint arXiv:2408.02034},
year={2024}
}