Skip to content

Automatically Export Book

Asterios Raptis edited this page Mar 24, 2026 · 25 revisions

Full Export Book Script Documentation

📚 Overview

The export pipeline automates the export of a book into multiple formats (Markdown, PDF, EPUB, DOCX) using **Pandoc **.

✨ Features

  • 👉 Converts relative image paths to absolute paths before export (optional)
  • 👉 Handles both Markdown images (![alt](path)) and HTML <img> / <figure> tags
  • 👉 Exports book content into multiple formats using Pandoc
  • 👉 Converts absolute paths back to relative paths after export (optional)
  • 👉 Supports custom arguments for flexible execution
  • 👉 Optional EPUB cover image support via --cover parameter
  • 👉 Auto-detects language from metadata.yaml (or override with --lang)
  • 👉 Smart TOC handling: Auto-generates TOC for EPUB ebook, uses manual TOC for print editions
  • 👉 Poetry integration: Run via poetry run full-export

❓ Why Convert Paths to Absolute?

🔍 The Problem

Pandoc does not always resolve relative paths correctly, especially when exporting to:

  • PDF (via LaTeX)
  • EPUB (due to internal resource handling)
  • DOCX (for embedded images)

Example problematic image reference:

![Figure 1](../../assets/figures/diagram.png)

This may result in broken references or missing images.

The Solution

Before export, the script automatically converts all image paths to absolute paths:

![Figure 1](/absolute/path/to/assets/figures/diagram.png)

This ensures:

  • No missing images in PDF, EPUB, and DOCX
  • Platform-independent behavior (Windows, macOS, Linux)
  • Correct image embedding across formats

After export, the script restores relative paths to keep the Markdown clean.


🚀 Installation & Requirements

1️⃣ Install Pandoc

Ensure Pandoc is installed:
🔗 https://pandoc.org/installing.html

pandoc --version

2️⃣ Install Python & Poetry

Ensure Python 3.13+ and Poetry are installed:

python3 --version
poetry --version

If Poetry is missing:

pip install poetry

3️⃣ Install Dependencies

Run:

poetry install

🛠 How to Use

1️⃣ Default Export (All Formats)

poetry run full-export

This will:

  • Convert images to absolute paths
  • Compile the book into Markdown, PDF, EPUB, and DOCX
  • Restore relative paths after export

2️⃣ Export Specific Formats

Specify formats using --format (comma-separated):

Available formats:

  • markdown (GitHub Flavored Markdown)
  • pdf
  • epub
  • docx

Example: Export only PDF and EPUB

poetry run full-export --format pdf,epub

3️⃣ Add a Cover to EPUB

Use the --cover option to specify a cover image for the EPUB:

poetry run full-export --format=epub --cover=assets/covers/cover.jpg

📌 Notes:

  • Only applies to EPUB export

  • If used without --format=epub, it will be ignored

  • Supported formats: .jpg, .jpeg, .png


4️⃣ Skip Image Processing

If images are already correctly linked, you can skip all image conversion steps:

poetry run full-export --skip-images

🚀 Skips both path rewriting and <img> tag transformations.


5️⃣ Keep Relative Paths

If you are using <figure> tags (or otherwise want to preserve relative paths), use:

poetry run full-export --keep-relative-paths

✅ This will:

  • Skip Step 1 (Convert to absolute paths)
  • Skip Step 4 (Restore relative paths)
  • Leave all image/URL references as-is

📌 Useful when your publishing environment already handles relative paths correctly.


6️⃣ Force EPUB 2 Format (Epubli Compatibility)

Some platforms like Epubli and the Tolino network still require EPUB 2 instead of the newer EPUB 3 standard.

To ensure compatibility, use the --epub2 flag:

poetry run full-export --format=epub --cover=assets/covers/cover.jpg --epub2

✅ This will:

  • Instruct Pandoc to export the EPUB in EPUB 2.0 format

  • Avoid common EPUB validation errors like:

    • RSC-005 Invalid metadata

    • OPF-092 Language tag issues (Deutsch (de-DE)de-DE)

📌 Notes:

  • Only applies to EPUB output

  • Has no effect on PDF, DOCX, or Markdown

  • You can combine it with --cover and --order

Use this option only if your distribution platform explicitly requires EPUB 2.

📖 Need more details about EPUB 2?

Check the full guide here:
👉 Export to EPUB 2 – Compatibility Guide

This page explains:

  • Why some platforms still require EPUB 2

  • How to validate your EPUB file

  • Common pitfalls and how to avoid them

  • Tips for using --epub2 effectively with Pandoc


7️⃣ Specify Language Metadata (Optional)

The script auto-detects the language from config/metadata.yaml.
However, you can override it:

poetry run full-export --lang de

🧠 Behavior:

  • If --lang is not provided, the script uses lang: from metadata.yaml

  • If both exist and mismatch, a warning is shown

  • If neither is set, defaults to 'en'

Example in config/metadata.yaml:

title: "My Book"
author: "Author Name"
lang: "en"

8️⃣ Table of Contents (TOC) Options

The script provides smart TOC handling based on the book type and format.

Default Behavior

Format Book Type TOC Behavior
EPUB ebook ✅ Auto-generated by Pandoc (valid epubcheck links)
EPUB paperback/hardcover ✅ Uses your toc_print_edition.md
PDF all ✅ Uses your existing TOC files
DOCX/HTML/Markdown all ✅ Uses your existing TOC files

Why Auto-Generate TOC for EPUB?

Manual TOC files with links like [Chapter 1](#chapter-1) cause epubcheck validation errors because EPUB requires cross-file references like ch001.xhtml#chapter-1. Pandoc's auto-generated TOC creates these correct links automatically.

Use Your Own TOC for EPUB (--use-manual-toc)

If you have a custom toc.md that you want to use instead of the auto-generated TOC:

poetry run full-export --format=epub --use-manual-toc

⚠️ Warning: This may cause epubcheck validation warnings if your TOC links don't include file references.

Configure TOC Depth (--toc-depth)

Control how many heading levels appear in the auto-generated TOC:

# Default: 2 levels (# and ##)
poetry run full-export --format=epub

# 3 levels for technical books (# ## ###)
poetry run full-export --format=epub --toc-depth=3

# Only top-level headings
poetry run full-export --format=epub --toc-depth=1

📌 Recommendations:

  • Depth 2: Best for most books, keeps TOC clean and navigable
  • Depth 3: Good for technical/academic books with many subsections
  • Depth 1: Too shallow for most use cases

Using TOC Options with Poetry Shortcuts

poetry run export-epub-safe --use-manual-toc
poetry run export-epub-safe --toc-depth=3
poetry run export-epub-safe --use-manual-toc --lang=de

Using TOC Options with Make

Make doesn't pass arguments directly. Use the ARGS variable:

make ebook ARGS="--use-manual-toc"
make ebook ARGS="--toc-depth=3"
make ebook ARGS="--use-manual-toc --lang=de"

9️⃣ Book Type Selection (--book-type)

Specify the book type for different output configurations:

# E-book (default) - optimized for digital reading
poetry run full-export --format=epub --book-type=ebook

# Paperback - uses print-specific TOC and formatting
poetry run full-export --format=pdf --book-type=paperback

# Hardcover - same as paperback
poetry run full-export --format=pdf --book-type=hardcover

Section Order by Book Type:

Book Type TOC File Used
ebook front-matter/toc.md (or auto-generated for EPUB)
paperback front-matter/toc_print_edition.md
hardcover front-matter/toc_print_edition.md


📃 Logs

All logs are saved in export.log.

To monitor live:

tail -f export.log

If errors occur, check export.log for debugging.


📂 Project Structure

book-project/
│── manuscript/
│   ├── chapters/
│   │   ├── 01-introduction.md
│   │   ├── 02-chapter.md
│   │   ├── ...
│   ├── front-matter/
│   │   ├── toc.md                    # TOC for ebook
│   │   ├── toc_print_edition.md      # TOC for print editions (optional)
│   │   ├── preface.md
│   │   ├── foreword.md
│   │   ├── acknowledgments.md
│   ├── back-matter/
│   │   ├── about-the-author.md
│   │   ├── appendix.md
│   │   ├── bibliography.md
│   │   ├── faq.md
│   │   ├── glossary.md
│   │   ├── index.md
│   ├── figures/
│   │   ├── fig1.png
│   │   ├── fig2.svg
│   │   ├── ...
│   ├── tables/
│   │   ├── table1.csv
│   │   ├── table2.csv
│   │   ├── ...
│   ├── references.bib  # If using citations (e.g., BibTeX, APA, MLA formats supported)
│── assets/ # Images, media, illustrations (for book content, cover design, and figures)
│   ├── covers/
│   │   ├── cover-design.png
│   ├── figures/
│   │   ├── diagrams/
│   │   ├── infographics/
│── config/ # Project configuration (metadata, styling, and settings)
│   ├── metadata.yaml          # Title, author, ISBN, etc.
│   ├── export-settings.yaml   # Section order per book type, format settings
│   ├── voice-settings.yaml    # TTS voice and language for audiobook
│   ├── init-settings.yaml     # Customizable project structure for init-bp
│   ├── styles.css             # Custom styles for PDF/eBook
│── output/             # Compiled book formats
│   ├── book.pdf
│   ├── book.epub
│   ├── book.docx
│── pyproject.toml    # Dependencies: manuscripta, manuscript-tools
│── Makefile
│── LICENSE
│── README.md

⚙️ Image Handling Options

The script now supports three different ways of handling images. Use the one that fits your workflow:

Mode Steps Executed Paths After Export When to Use
Default (no flags) ✅ Step 1 (convert to absolute)
✅ Step 4 (restore to relative)
✅ Tag conversion
Restored to relative Best for Pandoc (ensures images work in PDF/EPUB/DOCX while keeping Markdown clean)
--skip-images ❌ Step 1
❌ Step 4
❌ Tag conversion
Whatever is in your Markdown Fastest option, skips all image handling (use if your Markdown is already clean)
--keep-relative-paths ❌ Step 1
❌ Step 4
✅ Tag handling (if relevant)
Preserves relative paths Best when using <figure> or when your toolchain already supports relative paths

Note: --skip-images and --keep-relative-paths currently produce the same result (both skip Steps 1 & 4). They're marked as mutually exclusive to avoid confusion.

flowchart TD
    A([Start]) --> B{Flag?}
    B -->|--skip-images| C[Skip Step 1\nSkip Step 4\nSkip tag conversion]
    B -->|--keep-relative-paths| D[Skip Step 1\nRun tag handling\nSkip Step 4]
    B -->|No flags| E[Run Step 1\nRun tag handling to-absolute]

    C --> F[Step 2: Prepare output folder\nEnsure metadata]
    D --> F
    E --> F

    F --> G[Step 3: Compile\nmarkdown, pdf, epub, docx]
    G --> H{After compile}

    H -->|--skip-images| I[Step 4: Skipped\npaths unchanged]
    H -->|--keep-relative-paths| J[Step 4: Skipped\npaths remain relative]
    H -->|Default| K[Step 4: Restore paths to relative\nTag handling to-relative]

    I --> L[Step 5: Validation\nepub, pdf, docx, md]
    J --> L
    K --> L
Loading

Legend

  • Run (green): step is executed

  • Skipped (dark gray): step is not executed

  • Branching is decided in this order:

    1. --skip-images → skip all image-related work

    2. --keep-relative-paths → only skip path rewrites (keep relative paths)

    3. No flags → default behavior (absolute → compile → restore to relative)


🔑 Quick Reference

  • Use Default if you want maximum compatibility with Pandoc and platforms like PDF/EPUB/DOCX.
  • Use --skip-images if you want speed and have no broken links.
  • Use --keep-relative-paths if you rely on <figure> tags or know that relative paths will be resolved correctly downstream.

CLI Options Reference

All options can also be set in config/export-settings.yaml under export_defaults. CLI flags always override config values.

Option Description Default
--format Comma-separated formats (pdf,epub,docx,markdown,html) All formats
--order Custom section order (comma-separated) Auto by book-type
--cover Path to cover image for EPUB None
--epub2 Force EPUB 2 format EPUB 3
--lang Language code (en, de, fr, etc.) From metadata.yaml
--extension Custom file extension for markdown md
--book-type Book type: ebook, paperback, hardcover ebook
--output-file Custom output filename From pyproject.toml
--no-type-suffix Don't append book type to filename Append suffix
--toc-depth Depth of auto-generated TOC (1-3) 2
--use-manual-toc Use existing toc.md for EPUB ebook Auto-generate
--skip-images Skip all image processing Process images
--keep-relative-paths Keep relative paths unchanged Convert to absolute
--copy-epub-to Copy EPUB to directory after export None

Export Defaults (YAML Configuration)

Instead of passing CLI flags every time, set defaults in config/export-settings.yaml:

export_defaults:
  output_file: my-book
  lang: de
  book_type: ebook
  # format: pdf,epub
  # cover: assets/covers/cover.jpg
  # epub2: false
  # extension: md
  # toc_depth: 2
  # no_type_suffix: false
  # use_manual_toc: false
  # skip_images: false
  # keep_relative_paths: false
  # copy_epub_to: ~/Downloads
YAML Key CLI Flag Description
output_file --output-file Base name for output files
lang --lang Language code
book_type --book-type ebook, paperback, hardcover
format --format Comma-separated formats
cover --cover Cover image path
epub2 --epub2 Force EPUB 2
extension --extension Custom markdown extension
toc_depth --toc-depth TOC depth (1-3)
no_type_suffix --no-type-suffix Skip type suffix in filename
use_manual_toc --use-manual-toc Use manual TOC for EPUB
skip_images --skip-images Skip image processing
keep_relative_paths --keep-relative-paths Keep relative paths
copy_epub_to --copy-epub-to Copy EPUB after export

Priority: CLI > export_defaults > metadata.yaml/pyproject.toml > built-in defaults.


⚠️ Troubleshooting

1️⃣ Pandoc Not Found

If you see:

Command 'pandoc' not found

Install Pandoc:

sudo apt install pandoc  # Ubuntu/Debian
brew install pandoc  # macOS
choco install pandoc  # Windows

2️⃣ Cover Not Showing in EPUB

  • Ensure you pass --cover=...

  • Use .jpg or .png

  • Use an EPUB reader like Calibre or Thorium to verify

3️⃣ Images Missing

If you see:

[WARNING] This document format requires a nonempty <title> element.

🔧 If you're using --keep-relative-paths, make sure your target platform supports relative image references. Pandoc + LaTeX for PDF, for example, may still require absolute paths.

Ensure config/metadata.yaml exists.

  • Use absolute paths by default (--skip-images off)

  • Ensure referenced files exist in assets/

4️⃣ Pandoc Metadata Warning

Ensure config/metadata.yaml exists.
If missing, the script will automatically generate a default one.


5️⃣ Language mismatch warning?

You'll see:

⚠️⚠️⚠️ LANGUAGE MISMATCH DETECTED ⚠️⚠️⚠️
Metadata file says: 'de' but CLI argument is: 'en'
Using CLI argument value.

This is just a warning. It still works, but you may want to keep it consistent.


6️⃣ EPUB Validation Errors with Manual TOC

If you use --use-manual-toc and see epubcheck errors like:

ERROR(RSC-012): Fragment identifier not found

This means your TOC links don't match the EPUB's internal file structure. Solutions:

  1. Recommended: Remove --use-manual-toc and let Pandoc auto-generate the TOC
  2. Update your toc.md links to use the correct format: [Chapter](ch001.xhtml#heading)

7️⃣ Options Not Working with Poetry Shortcuts

If options like --use-manual-toc aren't being passed through:

# Check if you see "🔧 Forwarding valid options"
poetry run export-epub-safe --use-manual-toc

If you don't see "🔧 Forwarding valid options", ensure:

  1. The option is in FULL_EXPORT_ALLOWED_OPTS in shortcuts_export.py
  2. The shortcut function reads sys.argv[1:] when extra is empty

8️⃣ Options Not Working with Make

Make doesn't pass arguments directly. Use the ARGS variable:

# ❌ Won't work
make ebook --use-manual-toc

# ✅ Correct way
make ebook ARGS="--use-manual-toc"
make ebook ARGS="--toc-depth=3 --lang=de"

Ensure your Makefile target uses $(ARGS):

ebook: ## Export E-Book (EPUB)
	@$(POETRY) run export-epub-safe $(ARGS)

🎨 Add a Cover Image to EPUB Output

You can now pass a custom cover image for EPUB output using the --cover argument:

poetry run full-export --format=epub --cover=assets/covers/cover-image.jpg

✅ Requirements:

  • Accepted formats: .jpg, .jpeg, .png

  • Path should be relative to the project root (or absolute)

If you omit the --cover flag, the EPUB will be generated without an embedded cover image.


⚡ Quick Export Shortcuts

We've moved the shortcut documentation to its own dedicated page:

👉 View the Shortcut Reference →


💠 Emoji Replacement for KDP Compliance

To avoid issues with unsupported characters in EPUB/PDF uploads (especially for Kindle), use the emoji cleanup tool:

poetry run replace-emojis

This script:

  • Replaces emojis in Markdown files with safe printable symbols

  • Processes files inside front-matter, chapters, and back-matter

  • Uses the emoji map from manuscripta as its replacement reference

🧩 How to Extend emoji_map.py

You can easily add new emoji mappings in the emoji map from manuscripta.

Example addition:

EMOJI_MAP = {
    ...
"📱": "⌁",  # Smartphone icon → symbol
"💡": "⚡",  # Lightbulb icon → lightning bolt
}

Just rerun poetry run replace-emojis and the new replacements will apply.

🎉 Final Notes

This script helps you create a clean, professional, multi-format export of your book with:

  • 📦 automatic asset handling

  • 🌍 multi-language metadata support

  • 💻 full CLI integration with Poetry

  • ✅ EPUB 2 compatibility for commercial distribution

  • 📑 Smart TOC handling for ebook and print editions

For emoji compatibility, cover images, and other enhancements, check the Wiki.


🚀 Now ready for use in any book project! 🚀


Clone this wiki locally