reMarkable PDF OCR Flow - Mistral Handwriting Extraction

Extract handwritten text from PDF files using Mistral's Pixtral vision model in n8n workflows.

Overview

This project provides ready-to-use code for extracting handwritten text from PDFs using Mistral AI's Pixtral model within n8n workflows. Perfect for processing handwritten notes from reMarkable tablets or any other PDF with handwriting.

Features

✍️ Handwriting OCR: Extracts handwritten text from PDF pages using Mistral Pixtral 12B
📄 Multi-page Support: Processes all pages in a PDF document
📝 Dual Output: Returns both JSON (structured data) and Markdown (formatted text)
🔄 n8n Integration: Designed specifically for n8n workflows
📧 Email Triggered: Works with incoming email attachments
🚀 Multiple Implementations: Choose between Python or JavaScript

Use Case

Workflow:

Email arrives with PDF attachment containing handwritten notes
n8n workflow is triggered
PDF is processed through Mistral Pixtral for OCR
Extracted text is formatted as JSON and Markdown
Result is forwarded to another service (e.g., note-taking app, CRM, etc.)

Prerequisites

1. Mistral API Key

Get your API key from Mistral AI Console

2. n8n Installation

Self-hosted n8n (recommended) or n8n Cloud
Version: 1.0.0 or higher

3. Dependencies

For Python Version:

pip install pdf2image requests Pillow

You may also need poppler-utils:

# Ubuntu/Debian
sudo apt-get install poppler-utils

# macOS
brew install poppler

# Docker
# Add to your n8n Dockerfile:
RUN apt-get update && apt-get install -y poppler-utils

For JavaScript Version:

npm install pdf-lib
# or in n8n, these are typically pre-installed

Installation

Option 1: Python Code Node (Recommended)

Copy the Python code: Use n8n_mistral_ocr.py
Set environment variable in n8n:
- Go to Settings > Environments
- Add: MISTRAL_API_KEY=your_api_key_here
Install dependencies in your n8n environment
Create workflow (see below)

Option 2: JavaScript Code Node

Copy the JavaScript code: Use n8n_simple_mistral_ocr.js
Set API key in the code or environment variable
Create workflow (see below)

n8n Workflow Setup

Basic Workflow Structure

[Email Trigger] → [Extract Attachment] → [Python/JS Code] → [Process Output] → [Send to Service]

Detailed Setup

1. Email Trigger Node

Node Type: Email Trigger (IMAP)
Configuration:
- Set up your email account
- Filter for emails with attachments
- Enable "Download Attachments"

2. Code Node (Python or JavaScript)

Node Type: Code
Language: Python or JavaScript
Code: Paste the content from n8n_mistral_ocr.py or n8n_simple_mistral_ocr.js

Environment Variable Configuration:

MISTRAL_API_KEY=your_mistral_api_key_here

3. Output Processing

The code returns:

JSON Output:

{
  "success": true,
  "totalPages": 2,
  "extractedText": "Full extracted text from all pages...",
  "markdown": "# Extracted Handwritten Text\n\n...",
  "pages": [
    {
      "page": 1,
      "text": "Text from page 1...",
      "dimensions": {
        "width": 1654,
        "height": 2339
      }
    }
  ],
  "timestamp": "2025-11-23T12:00:00.000Z",
  "model": "pixtral-12b-2409"
}

Markdown Output:

# Extracted Handwritten Text

## Page 1

[Handwritten text from page 1]

---

## Page 2

[Handwritten text from page 2]

---

*Extracted on: 2025-11-23T12:00:00.000Z*
*Model: Mistral Pixtral 12B*

4. Forward to Service

Use HTTP Request, Webhook, or service-specific nodes to send the extracted text to your destination.

Example destinations:

Notion (create a new page)
Airtable (add record)
Google Docs (append to document)
Slack (send message)
Email (send formatted email)

Usage Examples

Example 1: Email → PDF OCR → Notion

[Email Trigger]
  ↓
[Code: Mistral OCR]
  ↓
[Notion: Create Page]
  - Title: Email subject
  - Content: {{ $json.markdown }}

Example 2: Email → PDF OCR → Slack

[Email Trigger]
  ↓
[Code: Mistral OCR]
  ↓
[Slack: Send Message]
  - Channel: #notes
  - Message:
    New handwritten note:
    {{ $json.extractedText }}

Example 3: Email → PDF OCR → Webhook

[Email Trigger]
  ↓
[Code: Mistral OCR]
  ↓
[HTTP Request]
  - Method: POST
  - URL: https://your-service.com/api/notes
  - Body: {{ $json }}

Configuration Options

Adjusting OCR Quality

In the code, you can modify:

Python:

# Line 34: Adjust DPI for quality vs speed
images = convert_from_bytes(pdf_bytes, dpi=300)  # Higher = better quality, slower

JavaScript:

// Adjust the prompt for specific extraction needs
text: 'Extract all handwritten text, including margin notes and annotations...'

Handling Multi-page PDFs

Both implementations automatically process all pages. The results include:

Per-page extracted text
Combined text from all pages
Page numbers and dimensions

API Costs

Mistral Pixtral 12B Pricing (as of 2025):

Input: ~$0.15 per 1M tokens
Output: ~$0.45 per 1M tokens

Estimated cost per page:

~$0.001 - $0.005 per page (depending on image size and text length)

Troubleshooting

Error: "MISTRAL_API_KEY environment variable not set"

Solution: Set the environment variable in n8n settings or directly in the code.

Error: "No PDF binary data found"

Solution: Check that your email trigger is configured to download attachments. Verify the binary data path in the code matches your workflow.

Error: "Mistral API error: 401"

Solution: Invalid API key. Verify your Mistral API key is correct and active.

Error: "pdf2image dependencies not found"

Solution: Install poppler-utils on your system (see Prerequisites).

Poor OCR Quality

Solutions:

Increase DPI in convert_from_bytes() (Python) from 300 to 400-600
Ensure source PDF has good resolution
Try adjusting the Mistral prompt to be more specific about what to extract

Timeout Errors

Solutions:

Process large PDFs in batches
Increase n8n workflow timeout settings
Consider splitting multi-page PDFs

Advanced Usage

Custom Prompts

Modify the extraction prompt for specific use cases:

# For forms
"Extract all handwritten text from this form, organized by field labels."

# For mathematical notation
"Extract all handwritten mathematical expressions and equations, preserving notation."

# For multi-language
"Extract all handwritten text. This may contain multiple languages including English and German."

Batch Processing

To process multiple PDFs in one workflow, use n8n's loop or batch processing nodes before the code node.

Output Formatting

Customize the markdown format in the format_as_markdown() function:

def format_as_markdown(results):
    markdown = "# Meeting Notes\n\n"
    markdown += f"**Date**: {datetime.now().strftime('%Y-%m-%d')}\n\n"
    # ... add your custom formatting
    return markdown

Files in This Repository

n8n_mistral_ocr.py - Python implementation (recommended)
n8n_simple_mistral_ocr.js - Simplified JavaScript implementation
n8n_mistral_pdf_ocr.js - Full-featured JavaScript implementation
README.md - This file
example_workflow.json - Example n8n workflow (coming soon)
package.json - Node.js dependencies

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Submit a pull request

License

MIT License - feel free to use in your projects!

Support

For issues and questions:

Open an issue on GitHub
Check n8n community forums
Review Mistral AI documentation

Credits

Mistral AI: For the Pixtral vision model
n8n: For the workflow automation platform
Inspired by various OCR and document processing workflows

Changelog

v1.0.0 (2025-11-23)

Initial release
Python and JavaScript implementations
Support for multi-page PDFs
JSON and Markdown output formats
n8n workflow integration

Happy OCR'ing! ✍️→📄→✨

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
EXAMPLE_OUTPUT.md		EXAMPLE_OUTPUT.md
README.md		README.md
example_workflow.json		example_workflow.json
n8n_mistral_ocr.py		n8n_mistral_ocr.py
n8n_mistral_pdf_ocr.js		n8n_mistral_pdf_ocr.js
n8n_simple_mistral_ocr.js		n8n_simple_mistral_ocr.js
package.json		package.json
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

reMarkable PDF OCR Flow - Mistral Handwriting Extraction

Overview

Features

Use Case

Prerequisites

1. Mistral API Key

2. n8n Installation

3. Dependencies

For Python Version:

For JavaScript Version:

Installation

Option 1: Python Code Node (Recommended)

Option 2: JavaScript Code Node

n8n Workflow Setup

Basic Workflow Structure

Detailed Setup

1. Email Trigger Node

2. Code Node (Python or JavaScript)

3. Output Processing

4. Forward to Service

Usage Examples

Example 1: Email → PDF OCR → Notion

Example 2: Email → PDF OCR → Slack

Example 3: Email → PDF OCR → Webhook

Configuration Options

Adjusting OCR Quality

Handling Multi-page PDFs

API Costs

Troubleshooting

Error: "MISTRAL_API_KEY environment variable not set"

Error: "No PDF binary data found"

Error: "Mistral API error: 401"

Error: "pdf2image dependencies not found"

Poor OCR Quality

Timeout Errors

Advanced Usage

Custom Prompts

Batch Processing

Output Formatting

Files in This Repository

Contributing

License

Support

Credits

Changelog

v1.0.0 (2025-11-23)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages