🎙️ `paper-to-audio`

Convert academic papers and PDFs to audio using GROBID and OpenAI's TTS API.

Features:

Encodes chapter marks into MP3.
Adapts formulas to be more suitable for TTS using OpenAI's GPT-4 LLM.
Generates cover image using OpenAI's Dall-E.

Important

Requires an OpenAI API key. A 10 page research paper will be ~30-60 minutes in length and cost $0.50 to $1 to generate.

Preview

The following is a 30 second snippet generated from a PDF (enable audio in video controls):

preview.mp4

Example of generated cover images:

Screenshot of CLI:

Example of chapters as seen in an audiobook player:

Setup

Requirements:

NodeJS (tested with v21) and Yarn v1 installed
ffmpeg installed, with libmp3lame enabled
OpenAI API key

Create a .env with your OpenAI key:

OPENAI_API_KEY=...

Install dependencies:

yarn install

Run with the path to your PDF file:

yarn start ./data/Example/Example.pdf

The MP3 file will be saved in the same directory as the PDF, e.g. ./data/Example/Example.mp3.

Data used for generating the file are stored in ./intermediate, and will be re-used in successive runs.

Optional configuration

Currently, parameters are configured using environment variables instead of command-line parameters. In addition to the OpenAI API key, the following environment variables can be set:

GROBID_URL=http://localhost:8070 # defaults to https://kermitt2-grobid.hf.space
TTS_VOICE=echo # Defaults to random voice, see options at https://platform.openai.com/docs/guides/text-to-speech/voice-options
TTS_MODEL=tts-1
IMAGE_MODEL=dall-e-3
LLM_MODEL=gpt-4o # Used for adapting formulas
INCLUDE_FIGURES=true # Include figure texts in audio. Defaults to false.
SKIP_CITATIONS=true # Skip in-text citations. Defaults to false.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🎙️ `paper-to-audio`

Preview

Setup

Optional configuration

Files

README.md

Latest commit

History

README.md

File metadata and controls

🎙️ paper-to-audio

Preview

Setup

Optional configuration

🎙️ `paper-to-audio`