Generate Audiobook

Generate Audiobook from Markdown or EPUB

Turn your book chapters into a high-quality audiobook with a single command. This tool converts .md chapter files or .epub books into .mp3 audio using modular Text-to-Speech (TTS) engines.

Overview

This script processes your existing manuscript (Markdown chapters or EPUB files) and turns them into spoken-word audio. It supports multiple engines and languages, cleans up Markdown/HTML formatting for natural narration, respects your book's section order, and outputs clean MP3 files per chapter.

Features

Converts Markdown chapters and EPUB files to .mp3
Cleans Markdown and HTML (removes images, formatting, code, links, <figure> tags)
Supports multiple TTS engines: edge (recommended), google, pyttsx3, elevenlabs
Configurable section order (defined in config/export-settings.yaml)
CLI-ready via Poetry (generate-audiobook)
Language, voice, and speech rate configurable
Optional voice-settings.yaml for reusable configuration
Output per chapter, numbered in book order, ready for publishing
Preview EPUB chapters before export (--list-chapters)
Skip unwanted chapters by keyword (--skip toc,cover,imprint)
Merge all chapters into a single audiobook file (--merge, requires ffmpeg)
Skip existing files by default, regenerate with --overwrite
Auto-chunking for long chapters to avoid TTS timeouts
Retry logic with backoff on transient TTS failures
Verbose progress output with timing and file sizes
File listing preview at startup with [exists] markers
Dependency checks at startup with clear install instructions

Project Structure

Supported Input Formats

Format	Description
Markdown directory	A folder containing `front-matter/`, `chapters/`, `back-matter/` with `.md` files
EPUB file	A single `.epub` file (chapters are extracted automatically in reading order)

Installation

# Core dependency (lightweight, no heavy ML libs)
poetry add edge-tts

# For EPUB support
poetry add ebooklib beautifulsoup4

# For merging chapters into single audiobook (system package)
sudo apt install ffmpeg

# Optional: other TTS engines
poetry add gTTS           # Google TTS
poetry add pyttsx3        # Offline TTS
poetry add elevenlabs     # ElevenLabs API

The script checks for required dependencies at startup and shows clear error messages if something is missing:

Error: Required package 'edge-tts' is not installed.
The 'edge' TTS engine needs it to work.

Install it with:
  poetry add edge-tts

For EPUB input, it also checks for ebooklib and beautifulsoup4 before processing:

Error: EPUB support requires 'ebooklib' and 'beautifulsoup4'.

Install them with:
  poetry add ebooklib beautifulsoup4

Quick Start

Minimal (all settings from voice-settings.yaml)

If config/voice-settings.yaml has input, output, engine, language, and voice set:

poetry run generate-audiobook

Or via Makefile:

make audiobook

No CLI flags needed.

Edge TTS with German voice

poetry run generate-audiobook \
  --input manuscript \
  --output audiobook/output/de

Generate and merge into single audiobook file

poetry run generate-audiobook \
  --input manuscript \
  --output audiobook/output/de \
  --merge

This generates all chapter MP3s and then merges them into a single file named {VoiceName}_{ProjectName}.mp3 (e.g. Conrad_eternity-audiobook.mp3). Requires ffmpeg (sudo apt install ffmpeg).

Merge with custom filename

poetry run generate-audiobook \
  --input manuscript \
  --output audiobook/output/de \
  --merge --merge-filename mein-hoerbuch.mp3

Or in voice-settings.yaml:

engine: edge
language: de
voice: de-DE-ConradNeural
input: manuscript
output: output/audiobook/de
merge: true
merge_filename: mein-hoerbuch.mp3
skip:
  - toc
  - toc-print
  - bibliography
  - imprint

From EPUB

poetry run generate-audiobook \
  --input my-book.epub \
  --output output/audiobook/de \
  --engine edge \
  --lang de

Preview EPUB chapters

poetry run generate-audiobook --input my-book.epub --list-chapters

Output:

Chapters in my-book.epub:

  01_Cover
  02_Inhaltsverzeichnis
  03_Vorwort
  04_Kapitel 1 - Einleitung
  05_Kapitel 2 - Grundlagen
  ...

Total: 12 chapter(s)
Use --skip to exclude chapters by keyword, e.g. --skip toc,cover,imprint

Skip unwanted EPUB chapters

poetry run generate-audiobook \
  --input my-book.epub \
  --output output/audiobook/de \

  --skip toc,cover,imprint

The matching is case-insensitive and checks both the chapter title and the filename inside the EPUB.

English with specific voice

poetry run generate-audiobook \
  --input manuscript/en \
  --output output/audiobook/en \
  --engine edge \
  --voice en-US-GuyNeural

With Google TTS (free, online)

poetry run generate-audiobook \
  --input manuscript/de \
  --output output/audiobook/de \
  --engine google \
  --lang de

With pyttsx3 (offline)

poetry run generate-audiobook \
  --input manuscript/en \
  --output output/audiobook/en \
  --engine pyttsx3 \
  --voice "english" \
  --rate 180

Command-Line Options

Option	Description
`--input`	Input folder with Markdown files OR a single `.epub` file. Can be set in config
`--output`	Output folder for `.mp3` files. Can be set in config
`--engine`	TTS engine: `edge` (default), `google`, `pyttsx3`, `elevenlabs`
`--lang`	Language code (e.g. `en`, `de`, `es`, `fr`, `el`)
`--voice`	Voice name or ID (depends on engine)
`--rate`	Speech rate (pyttsx3 only)
`--settings`	Path to voice-settings.yaml (auto-detected at `config/voice-settings.yaml`)
`--section-order`	Path to a YAML file defining section order for Markdown input
`--list-chapters`	List all chapters in the EPUB without generating audio, then exit
`--skip`	Comma-separated keywords to exclude chapters (case-insensitive)
`--merge`	Merge all chapter MP3s into a single audiobook file (requires ffmpeg)
`--merge-filename`	Exact filename for the merged file (overrides auto-generated name)
`--title`	Custom book title for the auto-generated merge filename
`--overwrite`	Regenerate existing MP3 files. Default: skip existing files

Voice Settings Configuration

All CLI options can be defined in config/voice-settings.yaml. The file is auto-detected when present, no --settings flag needed. CLI flags always override config values.

# TTS configuration for audiobook generation
# All settings can be overridden via CLI flags (CLI takes priority).
# Section order is defined centrally in export-settings.yaml (section_order.audiobook)

# TTS engine: edge (recommended), google, pyttsx3, elevenlabs
engine: edge

# Language code for TTS engine
language: de

# Voice identifier
voice: de-DE-ConradNeural

# Sections to skip during TTS processing
skip:
  - toc
  - toc-print
  - bibliography
  - imprint

# Input: directory with *.md files OR path to a single .epub file
# input: manuscript

# Output: directory for generated .mp3 files
# output: output/audiobook/de

# Overwrite existing MP3 files (default: false, skips existing)
# overwrite: false

# Merge all chapter MP3s into a single audiobook file (requires ffmpeg)
# merge: false

# Exact filename for the merged audiobook (overrides auto-generated name)
# merge_filename: my-book.mp3

# Custom book title for the auto-generated merge filename
# Only used when merge_filename is not set
# title: My Book Title

# Speech rate (pyttsx3 only)
# rate: 200

Field	CLI Flag	Default	Description
`engine`	`--engine`	`edge`	TTS engine to use
`language`	`--lang`	`en`	ISO language code
`voice`	`--voice`	-	Voice name or ID
`rate`	`--rate`	`200`	Speech rate (pyttsx3 only)
`skip`	`--skip`	-	Keywords to exclude chapters
`input`	`--input`	-	Input directory or EPUB file
`output`	`--output`	-	Output directory for MP3 files
`overwrite`	`--overwrite`	`false`	Regenerate existing files
`merge`	`--merge`	`false`	Merge chapters into single file
`merge_filename`	`--merge-filename`	-	Exact name for merged file
`title`	`--title`	-	Book title for auto-generated merge filename

Priority: CLI > voice-settings.yaml > defaults.

Section order for audiobook is defined in config/export-settings.yaml under section_order.audiobook, not in voice-settings.

Section Order

When using Markdown input, the script respects a configurable section order to produce audio files in correct book sequence. This is defined in config/export-settings.yaml under section_order.audiobook.

The default order is:

DEFAULT_SECTION_ORDER = [
    "front-matter/toc.md",
    "front-matter/foreword.md",
    "front-matter/preface.md",
    "chapters",
    "back-matter/epilogue.md",
    "back-matter/glossary.md",
    "back-matter/acknowledgments.md",
    "back-matter/about-the-author.md",
    "back-matter/bibliography.md",
    "back-matter/imprint.md",
]

Each entry is either a relative path to a specific .md file or a directory name (e.g. chapters) whose files are included sorted. Files or directories that don't exist in the project are silently skipped.

Output files are numbered for correct playback order:

01_foreword.mp3
02_preface.mp3
03_chapter-01.mp3
04_chapter-02.mp3
...
08_about-the-author.mp3

For EPUB input, the reading order comes directly from the EPUB spine, so no section order configuration is needed. Use --list-chapters to preview and --skip to exclude unwanted chapters.

Merging Chapters

Use --merge to combine all chapter MP3s into a single audiobook file:

poetry run generate-audiobook \
  --input manuscript \
  --output audiobook/output/de \

  --merge

The merged filename is derived automatically: {VoiceName}_{ProjectName}.mp3. The voice name is extracted from the Edge TTS identifier (e.g. de-DE-ConradNeural becomes Conrad), and the project name comes from the current directory. If the directory name contains -ebook, it is replaced with -audiobook.

Examples: Conrad_eternity-audiobook.mp3, Katja_mein-buch.mp3

You can override the title with --title or via title in voice-settings.yaml. For full control over the filename, use --merge-filename or merge_filename in the config.

Merging requires ffmpeg:

sudo apt install ffmpeg

The merge uses ffmpeg -c copy (lossless concatenation, no re-encoding), so it finishes in seconds regardless of audiobook length. Individual chapter files are preserved.

Skipping Existing Files

By default, the script skips chapters that already have a corresponding MP3 file. This allows you to resume interrupted runs without regenerating everything:

  [1/24] Exists:     01_foreword.mp3 (48 KB, use --overwrite to regenerate)
  [2/24] Exists:     02_preface.mp3 (102 KB, use --overwrite to regenerate)
  [3/24] Generating: 03_chapter-01.mp3 (8,432 chars, ~1,205 words)

To force regeneration of all files, add --overwrite.

Verbose Output

The script shows detailed progress during generation:

============================================================
  Audiobook Generator
============================================================
  Engine:   edge
  Language: de
  Voice:    de-DE-ConradNeural
  Settings: config/voice-settings.yaml
  Merge to: Conrad_eternity-audiobook.mp3
============================================================

Files to process (24):
  01_foreword.mp3 [exists]
  02_preface.mp3 [exists]
  03_01-0-part-1-intro.mp3
  04_01-chapter.mp3
  05_02-chapter.mp3
  ...

Source:  manuscript
Output:  audiobook/output/de
Files:   24

  [1/24] Exists:     01_foreword.mp3 (48 KB, use --overwrite to regenerate)
  [2/24] Exists:     02_preface.mp3 (102 KB, use --overwrite to regenerate)
  [3/24] Generating: 03_01-0-part-1-intro.mp3 (1,421 chars, ~206 words)
           Done in 3.1s (52 KB)
  [4/24] Generating: 04_01-chapter.mp3 (10,051 chars, ~1,478 words)
    Chunk 1/3 (3940 chars)
    Chunk 2/3 (3785 chars)
    Chunk 3/3 (2326 chars)
           Done in 24.6s (412 KB)
  ...

Finished: 24 file(s) in 765.7s

Merging 24 chapter(s) into Conrad_eternity-audiobook.mp3...
Merged:  audiobook/output/de/Conrad_eternity-audiobook.mp3
Size:    42.3 MB (0.8s)

The Merge to: line only appears in the banner when --merge is set. The file listing shows [exists] for chapters that already have an MP3 file and will be skipped (unless --overwrite is used).

Long chapters are automatically split into chunks of ~4000 characters to avoid Edge TTS timeouts. If a chunk fails, the script retries up to 3 times with increasing wait time before moving on to the next chapter.

If any chapters fail after all retries, a summary is shown at the end:

Finished: 20 file(s) in 765.7s
Warning:  4 chapter(s) failed to generate

Available Edge TTS Voices

List all available voices:

edge-tts --list-voices

Filter by language:

edge-tts --list-voices | grep de-
edge-tts --list-voices | grep en-
edge-tts --list-voices | grep es-

German Voices

Voice	Gender	Variant
`de-DE-KatjaNeural`	Female	Germany
`de-DE-ConradNeural`	Male	Germany
`de-AT-IngridNeural`	Female	Austria
`de-AT-JonasNeural`	Male	Austria
`de-CH-LeniNeural`	Female	Switzerland
`de-CH-JanNeural`	Male	Switzerland

English Voices

Voice	Gender	Variant
`en-US-JennyNeural`	Female	US
`en-US-GuyNeural`	Male	US
`en-GB-SoniaNeural`	Female	UK
`en-GB-RyanNeural`	Male	UK

Other Languages

Voice	Language
`es-ES-ElviraNeural`	Spanish
`fr-FR-DeniseNeural`	French
`el-GR-AthinaNeural`	Greek
`it-IT-ElsaNeural`	Italian
`pt-BR-FranciscaNeural`	Portuguese (BR)

Markdown Cleanup

The system automatically cleans your .md and HTML files before conversion:

Removes images: ![alt](url)
Strips links: [text](url) becomes text
Strips bold/italic formatting: **text**, *text*, etc.
Removes code blocks and inline code
Removes tables and HTML tags
Removes entire <figure> blocks (including <img> and <figcaption>, multiline)
Removes YAML front matter
Collapses multiple empty lines
Unescapes HTML entities

TTS Engine Comparison

Engine	Quality	Online	German	Best Use
edge	High (neural)	Yes	Yes	Recommended for all languages
google	Good (free)	Yes	Yes	Drafts, low-cost voiceover
elevenlabs	Studio-quality	Yes	Yes	Professional audiobooks
pyttsx3	Low (robotic)	No	Limited	Testing, offline fallback

Tips for Listening in the Car

Use Edge TTS default speed (no adjustment needed, natural pacing)
Choose a clear, articulate voice (de-DE-ConradNeural for German male)
After generation, normalize audio volume for consistent loudness:

ffmpeg -i input.mp3 -af loudnorm=I=-16:TP=-1.5:LRA=11 -ar 44100 -b:a 192k output.mp3

Use MP3 format for maximum car stereo compatibility

Tips for Publishing

Export one .mp3 per chapter
Use consistent filenames (the script numbers them automatically)
Normalize volume with tools like ffmpeg or Auphonic
Add intro/outro manually if needed
Note: Audible/ACX currently does not accept AI-narrated audiobooks
Platforms accepting AI narration: Google Play Books, Kobo, Findaway/Spotify (with disclosure)

Troubleshooting

Problem	Solution
Missing package error	Script shows exact install command, follow it
All chapters FAILED	Likely a missing dependency, check error messages
No audio generated	Check internet connection (Edge TTS requires online)
"No audio was received"	Verify voice name is correct (`edge-tts --list-voices \| grep de-`)
Script hangs on long text	Already handled by auto-chunking, check internet
Wrong chapter order	Configure `section_order.audiobook` in `export-settings.yaml`
Unwanted chapters in EPUB	Use `--list-chapters` to preview, then `--skip`
EPUB chapters missing	Some EPUBs have non-standard structure, check output
Merge fails	Install ffmpeg: `sudo apt install ffmpeg`
Want to regenerate files	Use `--overwrite` to force regeneration
Voice sounds unnatural	Try a different voice from `edge-tts --list-voices`
ElevenLabs not working	Set `ELEVENLABS_API_KEY` in your environment
Markdown sounds weird	Check for leftover syntax or unsupported tags

Generate Audiobook

Generate Audiobook from Markdown or EPUB

Overview

Features

Project Structure

Supported Input Formats

Installation

Quick Start

Minimal (all settings from voice-settings.yaml)

Edge TTS with German voice

Generate and merge into single audiobook file

Merge with custom filename

From EPUB

Preview EPUB chapters

Skip unwanted EPUB chapters

English with specific voice

With Google TTS (free, online)

With pyttsx3 (offline)

Command-Line Options

Voice Settings Configuration

Section Order

Merging Chapters

Skipping Existing Files

Verbose Output

Available Edge TTS Voices

German Voices

English Voices

Other Languages

Markdown Cleanup

TTS Engine Comparison

Tips for Listening in the Car

Tips for Publishing

Troubleshooting

Related Topics

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

write-book-template Wiki

Getting Started

Writing Tools

Translation Tools

Export & Publishing

Audio Tools

Project Shortcuts

Clone this wiki locally