Skip to content

Generate Audiobook

Asterios Raptis edited this page Mar 23, 2026 · 13 revisions

Generate Audiobook from Markdown or EPUB

Turn your book chapters into a high-quality audiobook with a single command. This tool converts .md chapter files or .epub books into .mp3 audio using modular Text-to-Speech (TTS) engines.

Overview

This script processes your existing manuscript (Markdown chapters or EPUB files) and turns them into spoken-word audio. It supports multiple engines and languages, cleans up Markdown/HTML formatting for natural narration, respects your book's section order, and outputs clean MP3 files per chapter.

Features

  • Converts Markdown chapters and EPUB files to .mp3
  • Cleans Markdown and HTML (removes images, formatting, code, links, <figure> tags)
  • Supports multiple TTS engines: edge (recommended), google, pyttsx3, elevenlabs
  • Configurable section order (defined in config/export-settings.yaml)
  • CLI-ready via Poetry (generate-audiobook)
  • Language, voice, and speech rate configurable
  • Optional voice-settings.yaml for reusable configuration
  • Output per chapter, numbered in book order, ready for publishing
  • Preview EPUB chapters before export (--list-chapters)
  • Skip unwanted chapters by keyword (--skip toc,cover,imprint)
  • Merge all chapters into a single audiobook file (--merge, requires ffmpeg)
  • Skip existing files by default, regenerate with --overwrite
  • Auto-chunking for long chapters to avoid TTS timeouts
  • Retry logic with backoff on transient TTS failures
  • Verbose progress output with timing and file sizes
  • File listing preview at startup with [exists] markers
  • Dependency checks at startup with clear install instructions

Project Structure

Supported Input Formats

Format Description
Markdown directory A folder containing front-matter/, chapters/, back-matter/ with .md files
EPUB file A single .epub file (chapters are extracted automatically in reading order)

Installation

# Core dependency (lightweight, no heavy ML libs)
poetry add edge-tts

# For EPUB support
poetry add ebooklib beautifulsoup4

# For merging chapters into single audiobook (system package)
sudo apt install ffmpeg

# Optional: other TTS engines
poetry add gTTS           # Google TTS
poetry add pyttsx3        # Offline TTS
poetry add elevenlabs     # ElevenLabs API

The script checks for required dependencies at startup and shows clear error messages if something is missing:

Error: Required package 'edge-tts' is not installed.
The 'edge' TTS engine needs it to work.

Install it with:
  poetry add edge-tts

For EPUB input, it also checks for ebooklib and beautifulsoup4 before processing:

Error: EPUB support requires 'ebooklib' and 'beautifulsoup4'.

Install them with:
  poetry add ebooklib beautifulsoup4

Quick Start

Minimal (all settings from voice-settings.yaml)

If config/voice-settings.yaml has input, output, engine, language, and voice set:

poetry run generate-audiobook

Or via Makefile:

make audiobook

No CLI flags needed.

Edge TTS with German voice

poetry run generate-audiobook \
  --input manuscript \
  --output audiobook/output/de

Generate and merge into single audiobook file

poetry run generate-audiobook \
  --input manuscript \
  --output audiobook/output/de \
  --merge

This generates all chapter MP3s and then merges them into a single file named {VoiceName}_{ProjectName}.mp3 (e.g. Conrad_eternity-audiobook.mp3). Requires ffmpeg (sudo apt install ffmpeg).

Merge with custom filename

poetry run generate-audiobook \
  --input manuscript \
  --output audiobook/output/de \
  --merge --merge-filename mein-hoerbuch.mp3

Or in voice-settings.yaml:

engine: edge
language: de
voice: de-DE-ConradNeural
input: manuscript
output: output/audiobook/de
merge: true
merge_filename: mein-hoerbuch.mp3
skip:
  - toc
  - toc-print
  - bibliography
  - imprint

From EPUB

poetry run generate-audiobook \
  --input my-book.epub \
  --output output/audiobook/de \
  --engine edge \
  --lang de

Preview EPUB chapters

poetry run generate-audiobook --input my-book.epub --list-chapters

Output:

Chapters in my-book.epub:

  01_Cover
  02_Inhaltsverzeichnis
  03_Vorwort
  04_Kapitel 1 - Einleitung
  05_Kapitel 2 - Grundlagen
  ...

Total: 12 chapter(s)
Use --skip to exclude chapters by keyword, e.g. --skip toc,cover,imprint

Skip unwanted EPUB chapters

poetry run generate-audiobook \
  --input my-book.epub \
  --output output/audiobook/de \

  --skip toc,cover,imprint

The matching is case-insensitive and checks both the chapter title and the filename inside the EPUB.

English with specific voice

poetry run generate-audiobook \
  --input manuscript/en \
  --output output/audiobook/en \
  --engine edge \
  --voice en-US-GuyNeural

With Google TTS (free, online)

poetry run generate-audiobook \
  --input manuscript/de \
  --output output/audiobook/de \
  --engine google \
  --lang de

With pyttsx3 (offline)

poetry run generate-audiobook \
  --input manuscript/en \
  --output output/audiobook/en \
  --engine pyttsx3 \
  --voice "english" \
  --rate 180

Command-Line Options

Option Description
--input Input folder with Markdown files OR a single .epub file. Can be set in config
--output Output folder for .mp3 files. Can be set in config
--engine TTS engine: edge (default), google, pyttsx3, elevenlabs
--lang Language code (e.g. en, de, es, fr, el)
--voice Voice name or ID (depends on engine)
--rate Speech rate (pyttsx3 only)
--settings Path to voice-settings.yaml (auto-detected at config/voice-settings.yaml)
--section-order Path to a YAML file defining section order for Markdown input
--list-chapters List all chapters in the EPUB without generating audio, then exit
--skip Comma-separated keywords to exclude chapters (case-insensitive)
--merge Merge all chapter MP3s into a single audiobook file (requires ffmpeg)
--merge-filename Exact filename for the merged file (overrides auto-generated name)
--title Custom book title for the auto-generated merge filename
--overwrite Regenerate existing MP3 files. Default: skip existing files

Voice Settings Configuration

All CLI options can be defined in config/voice-settings.yaml. The file is auto-detected when present, no --settings flag needed. CLI flags always override config values.

# TTS configuration for audiobook generation
# All settings can be overridden via CLI flags (CLI takes priority).
# Section order is defined centrally in export-settings.yaml (section_order.audiobook)

# TTS engine: edge (recommended), google, pyttsx3, elevenlabs
engine: edge

# Language code for TTS engine
language: de

# Voice identifier
voice: de-DE-ConradNeural

# Sections to skip during TTS processing
skip:
  - toc
  - toc-print
  - bibliography
  - imprint

# Input: directory with *.md files OR path to a single .epub file
# input: manuscript

# Output: directory for generated .mp3 files
# output: output/audiobook/de

# Overwrite existing MP3 files (default: false, skips existing)
# overwrite: false

# Merge all chapter MP3s into a single audiobook file (requires ffmpeg)
# merge: false

# Exact filename for the merged audiobook (overrides auto-generated name)
# merge_filename: my-book.mp3

# Custom book title for the auto-generated merge filename
# Only used when merge_filename is not set
# title: My Book Title

# Speech rate (pyttsx3 only)
# rate: 200
Field CLI Flag Default Description
engine --engine edge TTS engine to use
language --lang en ISO language code
voice --voice - Voice name or ID
rate --rate 200 Speech rate (pyttsx3 only)
skip --skip - Keywords to exclude chapters
input --input - Input directory or EPUB file
output --output - Output directory for MP3 files
overwrite --overwrite false Regenerate existing files
merge --merge false Merge chapters into single file
merge_filename --merge-filename - Exact name for merged file
title --title - Book title for auto-generated merge filename

Priority: CLI > voice-settings.yaml > defaults.

Section order for audiobook is defined in config/export-settings.yaml under section_order.audiobook, not in voice-settings.

Section Order

When using Markdown input, the script respects a configurable section order to produce audio files in correct book sequence. This is defined in config/export-settings.yaml under section_order.audiobook.

The default order is:

DEFAULT_SECTION_ORDER = [
    "front-matter/toc.md",
    "front-matter/foreword.md",
    "front-matter/preface.md",
    "chapters",
    "back-matter/epilogue.md",
    "back-matter/glossary.md",
    "back-matter/acknowledgments.md",
    "back-matter/about-the-author.md",
    "back-matter/bibliography.md",
    "back-matter/imprint.md",
]

Each entry is either a relative path to a specific .md file or a directory name (e.g. chapters) whose files are included sorted. Files or directories that don't exist in the project are silently skipped.

Output files are numbered for correct playback order:

01_foreword.mp3
02_preface.mp3
03_chapter-01.mp3
04_chapter-02.mp3
...
08_about-the-author.mp3

For EPUB input, the reading order comes directly from the EPUB spine, so no section order configuration is needed. Use --list-chapters to preview and --skip to exclude unwanted chapters.

Merging Chapters

Use --merge to combine all chapter MP3s into a single audiobook file:

poetry run generate-audiobook \
  --input manuscript \
  --output audiobook/output/de \

  --merge

The merged filename is derived automatically: {VoiceName}_{ProjectName}.mp3. The voice name is extracted from the Edge TTS identifier (e.g. de-DE-ConradNeural becomes Conrad), and the project name comes from the current directory. If the directory name contains -ebook, it is replaced with -audiobook.

Examples: Conrad_eternity-audiobook.mp3, Katja_mein-buch.mp3

You can override the title with --title or via title in voice-settings.yaml. For full control over the filename, use --merge-filename or merge_filename in the config.

Merging requires ffmpeg:

sudo apt install ffmpeg

The merge uses ffmpeg -c copy (lossless concatenation, no re-encoding), so it finishes in seconds regardless of audiobook length. Individual chapter files are preserved.

Skipping Existing Files

By default, the script skips chapters that already have a corresponding MP3 file. This allows you to resume interrupted runs without regenerating everything:

  [1/24] Exists:     01_foreword.mp3 (48 KB, use --overwrite to regenerate)
  [2/24] Exists:     02_preface.mp3 (102 KB, use --overwrite to regenerate)
  [3/24] Generating: 03_chapter-01.mp3 (8,432 chars, ~1,205 words)

To force regeneration of all files, add --overwrite.

Verbose Output

The script shows detailed progress during generation:

============================================================
  Audiobook Generator
============================================================
  Engine:   edge
  Language: de
  Voice:    de-DE-ConradNeural
  Settings: config/voice-settings.yaml
  Merge to: Conrad_eternity-audiobook.mp3
============================================================

Files to process (24):
  01_foreword.mp3 [exists]
  02_preface.mp3 [exists]
  03_01-0-part-1-intro.mp3
  04_01-chapter.mp3
  05_02-chapter.mp3
  ...

Source:  manuscript
Output:  audiobook/output/de
Files:   24

  [1/24] Exists:     01_foreword.mp3 (48 KB, use --overwrite to regenerate)
  [2/24] Exists:     02_preface.mp3 (102 KB, use --overwrite to regenerate)
  [3/24] Generating: 03_01-0-part-1-intro.mp3 (1,421 chars, ~206 words)
           Done in 3.1s (52 KB)
  [4/24] Generating: 04_01-chapter.mp3 (10,051 chars, ~1,478 words)
    Chunk 1/3 (3940 chars)
    Chunk 2/3 (3785 chars)
    Chunk 3/3 (2326 chars)
           Done in 24.6s (412 KB)
  ...

Finished: 24 file(s) in 765.7s

Merging 24 chapter(s) into Conrad_eternity-audiobook.mp3...
Merged:  audiobook/output/de/Conrad_eternity-audiobook.mp3
Size:    42.3 MB (0.8s)

The Merge to: line only appears in the banner when --merge is set. The file listing shows [exists] for chapters that already have an MP3 file and will be skipped (unless --overwrite is used).

Long chapters are automatically split into chunks of ~4000 characters to avoid Edge TTS timeouts. If a chunk fails, the script retries up to 3 times with increasing wait time before moving on to the next chapter.

If any chapters fail after all retries, a summary is shown at the end:

Finished: 20 file(s) in 765.7s
Warning:  4 chapter(s) failed to generate

Available Edge TTS Voices

List all available voices:

edge-tts --list-voices

Filter by language:

edge-tts --list-voices | grep de-
edge-tts --list-voices | grep en-
edge-tts --list-voices | grep es-

German Voices

Voice Gender Variant
de-DE-KatjaNeural Female Germany
de-DE-ConradNeural Male Germany
de-AT-IngridNeural Female Austria
de-AT-JonasNeural Male Austria
de-CH-LeniNeural Female Switzerland
de-CH-JanNeural Male Switzerland

English Voices

Voice Gender Variant
en-US-JennyNeural Female US
en-US-GuyNeural Male US
en-GB-SoniaNeural Female UK
en-GB-RyanNeural Male UK

Other Languages

Voice Language
es-ES-ElviraNeural Spanish
fr-FR-DeniseNeural French
el-GR-AthinaNeural Greek
it-IT-ElsaNeural Italian
pt-BR-FranciscaNeural Portuguese (BR)

Markdown Cleanup

The system automatically cleans your .md and HTML files before conversion:

  • Removes images: ![alt](url)
  • Strips links: [text](url) becomes text
  • Strips bold/italic formatting: **text**, *text*, etc.
  • Removes code blocks and inline code
  • Removes tables and HTML tags
  • Removes entire <figure> blocks (including <img> and <figcaption>, multiline)
  • Removes YAML front matter
  • Collapses multiple empty lines
  • Unescapes HTML entities

TTS Engine Comparison

Engine Quality Online German Best Use
edge High (neural) Yes Yes Recommended for all languages
google Good (free) Yes Yes Drafts, low-cost voiceover
elevenlabs Studio-quality Yes Yes Professional audiobooks
pyttsx3 Low (robotic) No Limited Testing, offline fallback

Tips for Listening in the Car

  • Use Edge TTS default speed (no adjustment needed, natural pacing)
  • Choose a clear, articulate voice (de-DE-ConradNeural for German male)
  • After generation, normalize audio volume for consistent loudness:
ffmpeg -i input.mp3 -af loudnorm=I=-16:TP=-1.5:LRA=11 -ar 44100 -b:a 192k output.mp3
  • Use MP3 format for maximum car stereo compatibility

Tips for Publishing

  • Export one .mp3 per chapter
  • Use consistent filenames (the script numbers them automatically)
  • Normalize volume with tools like ffmpeg or Auphonic
  • Add intro/outro manually if needed
  • Note: Audible/ACX currently does not accept AI-narrated audiobooks
  • Platforms accepting AI narration: Google Play Books, Kobo, Findaway/Spotify (with disclosure)

Troubleshooting

Problem Solution
Missing package error Script shows exact install command, follow it
All chapters FAILED Likely a missing dependency, check error messages
No audio generated Check internet connection (Edge TTS requires online)
"No audio was received" Verify voice name is correct (edge-tts --list-voices | grep de-)
Script hangs on long text Already handled by auto-chunking, check internet
Wrong chapter order Configure section_order.audiobook in export-settings.yaml
Unwanted chapters in EPUB Use --list-chapters to preview, then --skip
EPUB chapters missing Some EPUBs have non-standard structure, check output
Merge fails Install ffmpeg: sudo apt install ffmpeg
Want to regenerate files Use --overwrite to force regeneration
Voice sounds unnatural Try a different voice from edge-tts --list-voices
ElevenLabs not working Set ELEVENLABS_API_KEY in your environment
Markdown sounds weird Check for leftover syntax or unsupported tags

Related Topics

Clone this wiki locally