Overview • Installation • Usage • Providers • References
PDFToolkit is a CLI for extracting, analyzing, and benchmarking PDF content, with a focus on charts and visualizations. It provides a unified interface to multiple conversion and analysis backends, plus a harness for comparing parsers on a single document.
git clone https://github.com/amadad/pdftoolkit.git
cd pdftoolkit
# Install uv (if needed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create environment and install
uv venv && source .venv/bin/activate
uv syncSet up API keys:
export OPENAI_API_KEY="..." # For marker --describe, markitdown
export MISTRAL_API_KEY="..." # For mistral provider
export TOGETHER_API_KEY="..." # For together providerOptional installs:
uv pip install megaparse unstructured[all-docs]==0.15.0 # megaparse provider
uv pip install together # together provider# Default provider (docling)
pdftoolkit convert document.pdf
# Choose provider
pdftoolkit convert document.pdf -p marker
pdftoolkit convert document.pdf -p mistral
pdftoolkit convert document.pdf -p markitdown
pdftoolkit convert document.pdf -p megaparse
# With options
pdftoolkit convert document.pdf -p marker --describe # Add AI image descriptions (marker only)
pdftoolkit convert document.pdf -o custom_output/ # Custom output directory# Benchmark the default runnable commercial-safe tools
pdftoolkit benchmark document.pdf
# Benchmark an explicit commercial-friendly subset
pdftoolkit benchmark document.pdf -t docling -t markitdown -t mistral
# Benchmark optional research tools (if installed)
pdftoolkit benchmark document.pdf -t mineru -t olmocr -t paddleocrBy default, benchmark runs the low-friction commercial tool set that is currently runnable in your environment. Outputs are written under output/benchmark/<document-stem>/, with a results.json summary and per-tool output directories.
# Default provider (ollama - local)
pdftoolkit analyze chart.jpg
# Choose provider
pdftoolkit analyze chart.jpg -p ollama
pdftoolkit analyze chart.jpg -p together
pdftoolkit analyze chart.jpg -p colqwen
# With options
pdftoolkit analyze chart.jpg -q "What trends does this show?"
pdftoolkit analyze images/ --threshold 0.6 # Batch with confidence filter
# ColQwen returns relevance scores for queries
pdftoolkit analyze chart.jpg -p colqwen -q "chart showing growth"pdftoolkit --help
pdftoolkit convert --help
pdftoolkit benchmark --help
pdftoolkit analyze --help| Provider | Description | Requirements |
|---|---|---|
docling |
IBM's document toolkit, basic extraction | Default |
marker |
PDF extraction with image support | --describe needs OPENAI_API_KEY |
mistral |
Mistral OCR API | MISTRAL_API_KEY |
markitdown |
Microsoft's converter | OPENAI_API_KEY |
megaparse |
Advanced structure parsing | Separate install |
pdftoolkit benchmark can run the integrated convert providers plus optional eval tools when installed.
| Tool | What it is | Commercial use |
|---|---|---|
docling |
IBM document parser | Yes |
markitdown |
Microsoft converter | Yes |
mistral |
Mistral OCR API | Yes |
megaparse |
Structural parser | Yes |
marker |
Layout-focused parser | Review license/weights |
paddleocr |
PP-Structure parser | Yes |
olmocr |
Technical-doc OCR | Yes |
mineru |
Strong open parser | No (AGPL) |
got-ocr, qwen-vl, internvl, nanonets |
VLM eval tools | Review model licenses |
| Provider | Description | Requirements |
|---|---|---|
ollama |
Local Llama Vision | Ollama running locally |
together |
Together API with confidence scoring | TOGETHER_API_KEY |
colqwen |
Visual similarity/relevance scores | Local GPU recommended |
pdftoolkit/
├── cli.py # Typer CLI
├── providers/
│ ├── convert.py # PDF conversion providers
│ └── analyze.py # Image analysis providers
├── benchmark.py # Benchmark harness and tool registry
├── clients.py # API client singletons
└── utils.py # Shared utilities
src/ # Standalone scripts (reference implementations)
tests/ # Test suite
- ColQwen2 - Visual retrieval model
- Docling - IBM's document toolkit
- Marker - PDF extraction
- MegaParse - Advanced parsing
- MarkItDown - Microsoft's converter
- Mistral OCR - Mistral Document AI
- Ollama - Local LLM inference
- Together - Cloud LLM inference