Skip to content

haugasdev/text-to-speech-api

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text-to-Speech API

An API based on FastAPI for using neural text-to-speech synthesis. The API is designed to be used together with our text-to-speech workers.

The project is developed by the NLP research group at the University of Tartu. Speech synthesis can also be tested in our web demo.

API usage

The API documentation will be available at the /docs endpoint when deployed.

To use the API, use the following POST request format:

POST /v2

BODY (JSON):

{
    "text": "Tere.",
    "speaker": "mari",
    "speed": 1
}

The speaker parameter is required and should contain the speaker's name. The speed parameter is optional, and it is a multiplier between 0.5 and 2 compared to normal speed 1. As a result the API will return audio in .wav format.

POST /v2/verbose

BODY (JSON):

{
    "text": "Tere.",
    "speaker": "mari",
    "speed": 1
}

This endpoint returns additional information about the synthesized audio, such as the normalized text and duration frames.

POST /v2/stream_with_headers

BODY (JSON):

{
    "text": "Tere.",
    "speaker": "mari",
    "speed": 1
}

This endpoint returns the synthesized audio with additional metadata in the headers. The headers include:

  • Original-Text: Base64 encoded original text.
  • Normalized-Text: Base64 encoded normalized text.
  • Duration-Frames: Base64 encoded duration frames.
  • Sampling-Rate: Sampling rate of the audio.
  • Win-Length: Window length used in the synthesis.
  • Hop-Length: Hop length used in the synthesis.

RabbitMQ communication

The API forwards requests to various TTS engines using the RabbitMQ message broker. The communication uses a direct exchange named text-to-speech and a routing key of the request parameters using the format text-to-speech.$speaker. TTS workers establish queues within the exchange that are bound routing keys that illustrate which requests that the particular model can handle.

Setup

The API can be deployed using the docker image published alongside the repository. Each image version correlates to a specific release. The API is designed to work together with our text-to-speech worker containers and RabbitMQ.

The service is available on port 8000. By default, logging configuration is loaded from config/logging.prod.ini and service configuration from config/config.yaml files. The default versions of these files are included. To modify any config files, they should be mounted at /app/config (the absolute path in the container).

The following environment variables should be specified when running the container:

  • MQ_USERNAME - RabbitMQ username
  • MQ_PASSWORD - RabbitMQ user password
  • MQ_HOST - RabbitMQ host
  • MQ_PORT (optional) - RabbitMQ port (5672 by default)
  • MQ_TIMEOUT (optional) - Message timeout in seconds (30 by default)
  • MQ_EXCHANGE (optional) - RabbitMQ exchange name (text-to-speech by default)
  • MQ_CONNECTION_NAME (optional) - friendly connection name (Text-to-Speech API by default)
  • API_MAX_INPUT_LENGTH (optional) - maximum input text length in characters (10000 by default)
  • API_CONFIG_PATH (optional) - path of the config file used (config/config.yaml)

The entrypoint of the container is ["uvicorn", "app:app", "--host", "0.0.0.0", "--proxy-headers", "--log-config", "config/logging.ini"].

The CMD option can be used to override flags in the entrypoint or to override different Uvicorn parameters. For example, ["--log-config", "config/logging.debug.ini", "--root-path", "/text-to-speech"] enables debug logging and allows the API to be mounted under to the non-root path /text-to-speech when using a proxy server such as Nginx.

The setup can be tested with the following sample docker-compose.yml configuration:

version: '3'
services:
  rabbitmq:
    image: 'rabbitmq'
    environment:
      - RABBITMQ_DEFAULT_USER=${RABBITMQ_USER}
      - RABBITMQ_DEFAULT_PASS=${RABBITMQ_PASS}
  tts_api:
    image: ghcr.io/tartunlp/text-to-speech-api:latest
    environment:
      - MQ_HOST=rabbitmq
      - MQ_PORT=5672
      - MQ_USERNAME=${RABBITMQ_USER}
      - MQ_PASSWORD=${RABBITMQ_PASS}
    ports:
      - '8000:8000'
    depends_on:
      - rabbitmq
  tts_worker:
    image: ghcr.io/tartunlp/text-to-speech-worker:latest
    environment:
      - MQ_HOST=rabbitmq
      - MQ_PORT=5672
      - MQ_USERNAME=${RABBITMQ_USER}
      - MQ_PASSWORD=${RABBITMQ_PASS}
    command: [ "--model-name", "multispeaker" ]
    volumes:
      - ./models:/app/models
    depends_on:
      - rabbitmq

About

REST API for neural text-to-speech synthesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.9%
  • Dockerfile 5.1%