-
Notifications
You must be signed in to change notification settings - Fork 2
Generate Audiobook
Turn your book chapters into a high-quality audiobook with a single command. This tool converts .md chapter files or
.epub books into .mp3 audio using modular Text-to-Speech (TTS) engines.
This script processes your existing manuscript (Markdown chapters or EPUB files) and turns them into spoken-word audio. It supports multiple engines and languages, cleans up Markdown/HTML formatting for natural narration, respects your book's section order, and outputs clean MP3 files per chapter.
- Converts Markdown chapters and EPUB files to
.mp3 - Cleans Markdown and HTML (removes images, formatting, code, links,
<figure>tags) - Supports multiple TTS engines:
edge(recommended),google,pyttsx3,elevenlabs - Configurable section order (defined in
config/export-settings.yaml) - CLI-ready via Poetry (
generate-audiobook) - Language, voice, and speech rate configurable
- Optional
voice-settings.yamlfor reusable configuration - Output per chapter, numbered in book order, ready for publishing
- Preview EPUB chapters before export (
--list-chapters) - Skip unwanted chapters by keyword (
--skip toc,cover,imprint) - Merge all chapters into a single audiobook file (
--merge, requires ffmpeg) - Skip existing files by default, regenerate with
--overwrite - Auto-chunking for long chapters to avoid TTS timeouts
- Retry logic with backoff on transient TTS failures
- Verbose progress output with timing and file sizes
- File listing preview at startup with
[exists]markers - Dependency checks at startup with clear install instructions
| Format | Description |
|---|---|
| Markdown directory | A folder containing front-matter/, chapters/, back-matter/ with .md files |
| EPUB file | A single .epub file (chapters are extracted automatically in reading order) |
# Core dependency (lightweight, no heavy ML libs)
poetry add edge-tts
# For EPUB support
poetry add ebooklib beautifulsoup4
# For merging chapters into single audiobook (system package)
sudo apt install ffmpeg
# Optional: other TTS engines
poetry add gTTS # Google TTS
poetry add pyttsx3 # Offline TTS
poetry add elevenlabs # ElevenLabs APIThe script checks for required dependencies at startup and shows clear error messages if something is missing:
Error: Required package 'edge-tts' is not installed.
The 'edge' TTS engine needs it to work.
Install it with:
poetry add edge-tts
For EPUB input, it also checks for ebooklib and beautifulsoup4 before processing:
Error: EPUB support requires 'ebooklib' and 'beautifulsoup4'.
Install them with:
poetry add ebooklib beautifulsoup4
If config/voice-settings.yaml has input, output, engine, language, and voice set:
poetry run generate-audiobookOr via Makefile:
make audiobookNo CLI flags needed.
poetry run generate-audiobook \
--input manuscript \
--output audiobook/output/depoetry run generate-audiobook \
--input manuscript \
--output audiobook/output/de \
--mergeThis generates all chapter MP3s and then merges them into a single file named {VoiceName}_{ProjectName}.mp3 (e.g.
Conrad_eternity-audiobook.mp3). Requires ffmpeg (sudo apt install ffmpeg).
poetry run generate-audiobook \
--input manuscript \
--output audiobook/output/de \
--merge --merge-filename mein-hoerbuch.mp3Or in voice-settings.yaml:
engine: edge
language: de
voice: de-DE-ConradNeural
input: manuscript
output: output/audiobook/de
merge: true
merge_filename: mein-hoerbuch.mp3
skip:
- toc
- toc-print
- bibliography
- imprintpoetry run generate-audiobook \
--input my-book.epub \
--output output/audiobook/de \
--engine edge \
--lang depoetry run generate-audiobook --input my-book.epub --list-chaptersOutput:
Chapters in my-book.epub:
01_Cover
02_Inhaltsverzeichnis
03_Vorwort
04_Kapitel 1 - Einleitung
05_Kapitel 2 - Grundlagen
...
Total: 12 chapter(s)
Use --skip to exclude chapters by keyword, e.g. --skip toc,cover,imprint
poetry run generate-audiobook \
--input my-book.epub \
--output output/audiobook/de \
--skip toc,cover,imprintThe matching is case-insensitive and checks both the chapter title and the filename inside the EPUB.
poetry run generate-audiobook \
--input manuscript/en \
--output output/audiobook/en \
--engine edge \
--voice en-US-GuyNeuralpoetry run generate-audiobook \
--input manuscript/de \
--output output/audiobook/de \
--engine google \
--lang depoetry run generate-audiobook \
--input manuscript/en \
--output output/audiobook/en \
--engine pyttsx3 \
--voice "english" \
--rate 180| Option | Description |
|---|---|
--input |
Input folder with Markdown files OR a single .epub file. Can be set in config |
--output |
Output folder for .mp3 files. Can be set in config |
--engine |
TTS engine: edge (default), google, pyttsx3, elevenlabs
|
--lang |
Language code (e.g. en, de, es, fr, el) |
--voice |
Voice name or ID (depends on engine) |
--rate |
Speech rate (pyttsx3 only) |
--settings |
Path to voice-settings.yaml (auto-detected at config/voice-settings.yaml) |
--section-order |
Path to a YAML file defining section order for Markdown input |
--list-chapters |
List all chapters in the EPUB without generating audio, then exit |
--skip |
Comma-separated keywords to exclude chapters (case-insensitive) |
--merge |
Merge all chapter MP3s into a single audiobook file (requires ffmpeg) |
--merge-filename |
Exact filename for the merged file (overrides auto-generated name) |
--title |
Custom book title for the auto-generated merge filename |
--overwrite |
Regenerate existing MP3 files. Default: skip existing files |
All CLI options can be defined in config/voice-settings.yaml. The file is auto-detected when present, no --settings
flag needed. CLI flags always override config values.
# TTS configuration for audiobook generation
# All settings can be overridden via CLI flags (CLI takes priority).
# Section order is defined centrally in export-settings.yaml (section_order.audiobook)
# TTS engine: edge (recommended), google, pyttsx3, elevenlabs
engine: edge
# Language code for TTS engine
language: de
# Voice identifier
voice: de-DE-ConradNeural
# Sections to skip during TTS processing
skip:
- toc
- toc-print
- bibliography
- imprint
# Input: directory with *.md files OR path to a single .epub file
# input: manuscript
# Output: directory for generated .mp3 files
# output: output/audiobook/de
# Overwrite existing MP3 files (default: false, skips existing)
# overwrite: false
# Merge all chapter MP3s into a single audiobook file (requires ffmpeg)
# merge: false
# Exact filename for the merged audiobook (overrides auto-generated name)
# merge_filename: my-book.mp3
# Custom book title for the auto-generated merge filename
# Only used when merge_filename is not set
# title: My Book Title
# Speech rate (pyttsx3 only)
# rate: 200| Field | CLI Flag | Default | Description |
|---|---|---|---|
engine |
--engine |
edge |
TTS engine to use |
language |
--lang |
en |
ISO language code |
voice |
--voice |
- | Voice name or ID |
rate |
--rate |
200 |
Speech rate (pyttsx3 only) |
skip |
--skip |
- | Keywords to exclude chapters |
input |
--input |
- | Input directory or EPUB file |
output |
--output |
- | Output directory for MP3 files |
overwrite |
--overwrite |
false |
Regenerate existing files |
merge |
--merge |
false |
Merge chapters into single file |
merge_filename |
--merge-filename |
- | Exact name for merged file |
title |
--title |
- | Book title for auto-generated merge filename |
Priority: CLI > voice-settings.yaml > defaults.
Section order for audiobook is defined in config/export-settings.yaml under section_order.audiobook, not in
voice-settings.
When using Markdown input, the script respects a configurable section order to produce audio files in correct book
sequence. This is defined in config/export-settings.yaml under section_order.audiobook.
The default order is:
DEFAULT_SECTION_ORDER = [
"front-matter/toc.md",
"front-matter/foreword.md",
"front-matter/preface.md",
"chapters",
"back-matter/epilogue.md",
"back-matter/glossary.md",
"back-matter/acknowledgments.md",
"back-matter/about-the-author.md",
"back-matter/bibliography.md",
"back-matter/imprint.md",
]Each entry is either a relative path to a specific .md file or a directory name (e.g. chapters) whose files are
included sorted. Files or directories that don't exist in the project are silently skipped.
Output files are numbered for correct playback order:
01_foreword.mp3
02_preface.mp3
03_chapter-01.mp3
04_chapter-02.mp3
...
08_about-the-author.mp3
For EPUB input, the reading order comes directly from the EPUB spine, so no section order configuration is needed. Use
--list-chapters to preview and --skip to exclude unwanted chapters.
Use --merge to combine all chapter MP3s into a single audiobook file:
poetry run generate-audiobook \
--input manuscript \
--output audiobook/output/de \
--mergeThe merged filename is derived automatically: {VoiceName}_{ProjectName}.mp3. The voice name is extracted from the Edge
TTS identifier (e.g. de-DE-ConradNeural becomes Conrad), and the project name comes from the current directory. If
the directory name contains -ebook, it is replaced with -audiobook.
Examples: Conrad_eternity-audiobook.mp3, Katja_mein-buch.mp3
You can override the title with --title or via title in voice-settings.yaml.
For full control over the filename, use --merge-filename or merge_filename in the config.
Merging requires ffmpeg:
sudo apt install ffmpegThe merge uses ffmpeg -c copy (lossless concatenation, no re-encoding), so it finishes in seconds regardless of
audiobook length. Individual chapter files are preserved.
By default, the script skips chapters that already have a corresponding MP3 file. This allows you to resume interrupted runs without regenerating everything:
[1/24] Exists: 01_foreword.mp3 (48 KB, use --overwrite to regenerate)
[2/24] Exists: 02_preface.mp3 (102 KB, use --overwrite to regenerate)
[3/24] Generating: 03_chapter-01.mp3 (8,432 chars, ~1,205 words)
To force regeneration of all files, add --overwrite.
The script shows detailed progress during generation:
============================================================
Audiobook Generator
============================================================
Engine: edge
Language: de
Voice: de-DE-ConradNeural
Settings: config/voice-settings.yaml
Merge to: Conrad_eternity-audiobook.mp3
============================================================
Files to process (24):
01_foreword.mp3 [exists]
02_preface.mp3 [exists]
03_01-0-part-1-intro.mp3
04_01-chapter.mp3
05_02-chapter.mp3
...
Source: manuscript
Output: audiobook/output/de
Files: 24
[1/24] Exists: 01_foreword.mp3 (48 KB, use --overwrite to regenerate)
[2/24] Exists: 02_preface.mp3 (102 KB, use --overwrite to regenerate)
[3/24] Generating: 03_01-0-part-1-intro.mp3 (1,421 chars, ~206 words)
Done in 3.1s (52 KB)
[4/24] Generating: 04_01-chapter.mp3 (10,051 chars, ~1,478 words)
Chunk 1/3 (3940 chars)
Chunk 2/3 (3785 chars)
Chunk 3/3 (2326 chars)
Done in 24.6s (412 KB)
...
Finished: 24 file(s) in 765.7s
Merging 24 chapter(s) into Conrad_eternity-audiobook.mp3...
Merged: audiobook/output/de/Conrad_eternity-audiobook.mp3
Size: 42.3 MB (0.8s)
The Merge to: line only appears in the banner when --merge is set. The file listing shows [exists] for chapters
that already have an MP3 file and will be skipped (unless --overwrite is used).
Long chapters are automatically split into chunks of ~4000 characters to avoid Edge TTS timeouts. If a chunk fails, the script retries up to 3 times with increasing wait time before moving on to the next chapter.
If any chapters fail after all retries, a summary is shown at the end:
Finished: 20 file(s) in 765.7s
Warning: 4 chapter(s) failed to generate
List all available voices:
edge-tts --list-voicesFilter by language:
edge-tts --list-voices | grep de-
edge-tts --list-voices | grep en-
edge-tts --list-voices | grep es-| Voice | Gender | Variant |
|---|---|---|
de-DE-KatjaNeural |
Female | Germany |
de-DE-ConradNeural |
Male | Germany |
de-AT-IngridNeural |
Female | Austria |
de-AT-JonasNeural |
Male | Austria |
de-CH-LeniNeural |
Female | Switzerland |
de-CH-JanNeural |
Male | Switzerland |
| Voice | Gender | Variant |
|---|---|---|
en-US-JennyNeural |
Female | US |
en-US-GuyNeural |
Male | US |
en-GB-SoniaNeural |
Female | UK |
en-GB-RyanNeural |
Male | UK |
| Voice | Language |
|---|---|
es-ES-ElviraNeural |
Spanish |
fr-FR-DeniseNeural |
French |
el-GR-AthinaNeural |
Greek |
it-IT-ElsaNeural |
Italian |
pt-BR-FranciscaNeural |
Portuguese (BR) |
The system automatically cleans your .md and HTML files before conversion:
- Removes images:
 - Strips links:
[text](url)becomestext - Strips bold/italic formatting:
**text**,*text*, etc. - Removes code blocks and inline code
- Removes tables and HTML tags
- Removes entire
<figure>blocks (including<img>and<figcaption>, multiline) - Removes YAML front matter
- Collapses multiple empty lines
- Unescapes HTML entities
| Engine | Quality | Online | German | Best Use |
|---|---|---|---|---|
| edge | High (neural) | Yes | Yes | Recommended for all languages |
| Good (free) | Yes | Yes | Drafts, low-cost voiceover | |
| elevenlabs | Studio-quality | Yes | Yes | Professional audiobooks |
| pyttsx3 | Low (robotic) | No | Limited | Testing, offline fallback |
- Use Edge TTS default speed (no adjustment needed, natural pacing)
- Choose a clear, articulate voice (
de-DE-ConradNeuralfor German male) - After generation, normalize audio volume for consistent loudness:
ffmpeg -i input.mp3 -af loudnorm=I=-16:TP=-1.5:LRA=11 -ar 44100 -b:a 192k output.mp3- Use MP3 format for maximum car stereo compatibility
- Export one
.mp3per chapter - Use consistent filenames (the script numbers them automatically)
- Normalize volume with tools like
ffmpegor Auphonic - Add intro/outro manually if needed
- Note: Audible/ACX currently does not accept AI-narrated audiobooks
- Platforms accepting AI narration: Google Play Books, Kobo, Findaway/Spotify (with disclosure)
| Problem | Solution |
|---|---|
| Missing package error | Script shows exact install command, follow it |
| All chapters FAILED | Likely a missing dependency, check error messages |
| No audio generated | Check internet connection (Edge TTS requires online) |
| "No audio was received" | Verify voice name is correct (edge-tts --list-voices | grep de-) |
| Script hangs on long text | Already handled by auto-chunking, check internet |
| Wrong chapter order | Configure section_order.audiobook in export-settings.yaml
|
| Unwanted chapters in EPUB | Use --list-chapters to preview, then --skip
|
| EPUB chapters missing | Some EPUBs have non-standard structure, check output |
| Merge fails | Install ffmpeg: sudo apt install ffmpeg
|
| Want to regenerate files | Use --overwrite to force regeneration |
| Voice sounds unnatural | Try a different voice from edge-tts --list-voices
|
| ElevenLabs not working | Set ELEVENLABS_API_KEY in your environment |
| Markdown sounds weird | Check for leftover syntax or unsupported tags |
- Home
- Project Initialization
- Generate Project Structure
- How to Write a Book
- Developer Workflow & Makefile
- Chapter File Generator
- Generate Images
- Convert Markdown Images
- Bulk Change File Extensions
- Restructure Chapters
- Remove Bold from Markdown Headers
- Converting Markdown Bullets to Typographic Bullets
- Translate Markdown with DeepL
- Translate with LM Studio
- Translation CLI Commands
- Shortcuts for Translation
- Automatic Book Export
- Shortcuts for Export
- Export HTML Chapters (Comics)
- Export to EPUB 2
- Pandoc Batch Processor
- Export HTML Books to PDF (KDP Ready)
**Libraries: ** manuscripta | manuscript-tools