This is the official repository for the paper "Evaluating Multimodal Large Language Models on Vertically Written Japanese Text".
We evaluate the reading capability of exsisting MLLMs on vertically written Japanese text.
JSSODa (Japanese Simple Synthetic OCR Dataset) is constructed by rendering Japanese text generated by an LLM into images. The images contain text written both vertically and horizontally, which is organized into one to four columns.
- train, val: llm-jp/JSSODa
- test: llm-jp/JSSODa-test
VJRODa (Vertical Japanese Real-world OCR Dataset) consists of images containing vertically written Japanese text sourced from the real-world PDF pages.
Install uv, then run the following commands:
uv venv --python 3.10.18 --seed
uv syncPlease refer to this README.
The code is released under the Apache License, Version 2.0.
@misc{sasagawa2025evaluatingmultimodallargelanguage,
title={Evaluating Multimodal Large Language Models on Vertically Written Japanese Text},
author={Keito Sasagawa and Shuhei Kurita and Daisuke Kawahara},
year={2025},
eprint={2511.15059},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.15059},
}