Skip to content

llm-jp/eval_vertical_ja

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluating Multimodal Large Language Models on Vertically Written Japanese Text

This is the official repository for the paper "Evaluating Multimodal Large Language Models on Vertically Written Japanese Text".

Introduction

We evaluate the reading capability of exsisting MLLMs on vertically written Japanese text.

Releases

JSSODa

JSSODa (Japanese Simple Synthetic OCR Dataset) is constructed by rendering Japanese text generated by an LLM into images. The images contain text written both vertically and horizontally, which is organized into one to four columns.

VJRODa

VJRODa (Vertical Japanese Real-world OCR Dataset) consists of images containing vertically written Japanese text sourced from the real-world PDF pages.

Installation

Install uv, then run the following commands:

uv venv --python 3.10.18 --seed
uv sync

Datset Construction, Training, and Evaluation

Please refer to this README.

License

The code is released under the Apache License, Version 2.0.

Citation

@misc{sasagawa2025evaluatingmultimodallargelanguage,
      title={Evaluating Multimodal Large Language Models on Vertically Written Japanese Text}, 
      author={Keito Sasagawa and Shuhei Kurita and Daisuke Kawahara},
      year={2025},
      eprint={2511.15059},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.15059}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published