Skip to content

Commit

Permalink
more pythonic work, removed Flask+WSGI for fastapi+ASGI, click ctl sc…
Browse files Browse the repository at this point in the history
…ript (VRCWizard#3)

changing tabs to spaces

use pysoundfile instead of scipy

standard python structure

cleanup of README

housekeeping and python formatting

glados tts engine made pythonic

switch from flask to fastapi

update README to reflect current work

update lockfile

update README to reflect current work

link to Ellen McClain's wikipedia page

remove clutter

Co-authored-by: Ben Kristinsson <[email protected]>
Reviewed-on: https://git.sudo.is/b/glados-tts/pulls/3
  • Loading branch information
benediktkr committed May 13, 2023
1 parent fe2554a commit ad663ba
Show file tree
Hide file tree
Showing 10 changed files with 1,131 additions and 600 deletions.
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,20 @@ __pycache__/
dist/
glados_tts.egg-info/
*.pyc
.clutter/

audio/*
!audio/.gitkeep
*.wav
config.py
config.json
config.toml
config.yaml
config.yml
glados.json
glados.toml
glados.yaml
glados.yml

*~
.#*
Expand Down
27 changes: 6 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ Neural network based TTS Engine.

## Description
The initial, regular Tacotron model was trained first on LJSpeech, and
then on a heavily modified version of the Ellen McClain dataset (all
then on a heavily modified version of the [Ellen
McClain](https://en.wikipedia.org/wiki/Ellen_McLain) dataset (all
non-Portal 2 voice lines removed, punctuation added).

* The Forward Tacotron model was only trained on about 600 voice lines.
Expand All @@ -28,32 +29,16 @@ This fork modernizes and improves the Python code in the project and does a bunc
* `[DONE]`: Gets rid of the `SciPy` dependency (replaced with the more modern and lightwight [`pysoundfile`](https://github.com/gooofy/py-espeak-ng) (since all it was used for was writing a `.wav` file to disk)
* `[DONE]`: Support modern stable Python 3 versions, and update dependencies.
* `[DONE]`: Versioned packages with `poetry` and `pyproject.toml`
* `[DONE]`: Configuration handling with `click`.
* `[DONE]`: Better logging with `loguru`
* `[WIP]`: Python coding style and code quality improvements (proper handling of `file` object, improved logging..)
* `[TODO]`: Support Home Assistant through the [`notify` integration](https://www.home-assistant.io/integrations/notify/)
* `[TODO]`: Configuration handling with `click`.
* `[TODO]`: Better logging with `loguru`
* `[TODO]`: Using `waitress` as a WSGI-server for production-capable deployments
* `[WIP]`: Switch to using ASGI with `uvicorn` and `fastapi` instead of Flask and WSGI, and support production-capable deployments as default.
* `[TODO]`: Docker support
* `[TODO]`: Support Home Assistant through the [`notify` integration](https://www.home-assistant.io/integrations/notify/)
* `[TODO]`: see if its possible to avoid `espeak-ng` as a system package dependency (python bindings, buliding the C library, etc)

No work on the speech model itself is expected.

### Home Assistant `notify` service

configuration in `configuration.yaml` (or a `package/`):

```yaml
# Enable rest api
api:

notify:
- name: glados
platform: rest
resource: http://${GLADOS}/notify
```
This is roughly how it would work (not done yet).
## Install

First you need to [install the `espeak-ng` system
Expand Down
122 changes: 122 additions & 0 deletions engine_old.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
import sys
import os
import time

import torch
import soundfile

from glados_tts.utils.tools import prepare_text


print("\033[1;94mINFO:\033[;97m Initializing TTS Engine...")

# Select the device
if torch.is_vulkan_available():
device = 'vulkan'
if torch.cuda.is_available():
device = 'cuda'
else:
device = 'cpu'

# Load models
if __name__ == "__main__":
glados = torch.jit.load('models/glados.pt')
vocoder = torch.jit.load('models/vocoder-gpu.pt', map_location=device)
else:
glados = torch.jit.load('glados_tts/models/glados.pt')
vocoder = torch.jit.load('glados_tts/models/vocoder-gpu.pt', map_location=device)

# Prepare models in RAM
for i in range(4):
init = glados.generate_jit(prepare_text(str(i)))
init_mel = init['mel_post'].to(device)
init_vo = vocoder(init_mel)


def glados_tts(text, key=False):

# Tokenize, clean and phonemize input text
x = prepare_text(text).to('cpu')

with torch.no_grad():

# Generate generic TTS-output
old_time = time.time()
tts_output = glados.generate_jit(x)

# Use HiFiGAN as vocoder to make output sound like GLaDOS
mel = tts_output['mel_post'].to(device)
audio = vocoder(mel)
print("\033[1;94mINFO:\033[;97m The audio sample took " +
str(round((time.time() - old_time) * 1000)) + " ms to generate.")

# Normalize audio to fit in wav-file
audio = audio.squeeze()
audio = audio * 32768.0
audio = audio.cpu().numpy().astype('int16')
if (key):
output_file = ('audio/GLaDOS-tts-temp-output-'+key+'.wav')
else:
output_file = ('audio/GLaDOS-tts-temp-output.wav')

# Write audio file to disk
# 22,05 kHz sample rate
soundfile.write(output_file, audio, 22050)

return True


def main():
# Remote Engine Veritables
PORT = 8124
CACHE = True

from flask import Flask, request, send_file
import urllib.parse
import shutil

print("\033[1;94mINFO:\033[;97m Initializing TTS Server...")

app = Flask(__name__)

@app.route('/synthesize/', defaults={'text': ''})
@app.route('/synthesize/<path:text>')
def synthesize(text):
if (text == ''):
return 'No input'

line = urllib.parse.unquote(request.url[request.url.find('synthesize/')+11:])
filename = "GLaDOS-tts-"+line.replace(" ", "-")
filename = filename.replace("!", "")
filename = filename.replace("°c", "degrees celcius")
filename = filename.replace(",", "")+".wav"
file = os.getcwd()+'/audio/'+filename

# Check for Local Cache
if (os.path.isfile(file)):

# Update access time. This will allow for routine cleanups
os.utime(file, None)
print("\033[1;94mINFO:\033[;97m The audio sample sent from cache.")
return send_file(file)

# Generate New Sample
key = str(time.time())[7:]
if (glados_tts(line, key)):
tempfile = os.getcwd()+'/audio/GLaDOS-tts-temp-output-'+key+'.wav'

# If the line isn't too long, store in cache
if (len(line) < 200 and CACHE):
shutil.move(tempfile, file)
else:
return send_file(tempfile)
os.remove(tempfile)

return send_file(file)

else:
return 'TTS Engine Failed'

cli = sys.modules['flask.cli']
cli.show_server_banner = lambda *x: None
app.run(host="0.0.0.0", port=PORT)
10 changes: 10 additions & 0 deletions glados_tts/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# restapi

port = 8124
cache = True
base_url = "/glados/tts"

# tts

audio_dir = "audio/"
fname_prefix = "GLaDOS-tts"
Loading

0 comments on commit ad663ba

Please sign in to comment.