more pythonic work, removed Flask+WSGI for fastapi+ASGI, click ctl sc…

…ript (VRCWizard#3) changing tabs to spaces use pysoundfile instead of scipy standard python structure cleanup of README housekeeping and python formatting glados tts engine made pythonic switch from flask to fastapi update README to reflect current work update lockfile update README to reflect current work link to Ellen McClain's wikipedia page remove clutter Co-authored-by: Ben Kristinsson <[email protected]> Reviewed-on: https://git.sudo.is/b/glados-tts/pulls/3
benediktkr · May 13, 2023 · ad663ba · ad663ba
1 parent fe2554a
commit ad663ba
Show file tree

Hide file tree

Showing 10 changed files with 1,131 additions and 600 deletions.
diff --git a/.gitignore b/.gitignore
@@ -2,10 +2,20 @@ __pycache__/
 dist/
 glados_tts.egg-info/
 *.pyc
+.clutter/
 
 audio/*
 !audio/.gitkeep
 *.wav
+config.py
+config.json
+config.toml
+config.yaml
+config.yml
+glados.json
+glados.toml
+glados.yaml
+glados.yml
 
 *~
 .#*

diff --git a/README.md b/README.md
@@ -9,7 +9,8 @@ Neural network based TTS Engine.
 
 ## Description
 The initial, regular Tacotron model was trained first on LJSpeech, and
-then on a heavily modified version of the Ellen McClain dataset (all
+then on a heavily modified version of the [Ellen
+McClain](https://en.wikipedia.org/wiki/Ellen_McLain) dataset (all
 non-Portal 2 voice lines removed, punctuation added).
 
 * The Forward Tacotron model was only trained on about 600 voice lines.
@@ -28,32 +29,16 @@ This fork modernizes and improves the Python code in the project and does a bunc
 * `[DONE]`: Gets rid of the `SciPy` dependency (replaced with the more modern and lightwight [`pysoundfile`](https://github.com/gooofy/py-espeak-ng) (since all it was used for was writing a `.wav` file to disk)
 * `[DONE]`: Support modern stable Python 3 versions, and update dependencies.
 * `[DONE]`: Versioned packages with `poetry` and `pyproject.toml`
+* `[DONE]`: Configuration handling with `click`.
+* `[DONE]`: Better logging with `loguru`
 * `[WIP]`: Python coding style and code quality improvements (proper handling of `file` object, improved logging..)
-* `[TODO]`: Support Home Assistant through the [`notify` integration](https://www.home-assistant.io/integrations/notify/)
-* `[TODO]`: Configuration handling with `click`.
-* `[TODO]`: Better logging with `loguru`
-* `[TODO]`: Using `waitress` as a WSGI-server for production-capable deployments
+* `[WIP]`: Switch to using ASGI with `uvicorn` and `fastapi` instead of Flask and WSGI, and support production-capable deployments as default.
 * `[TODO]`: Docker support
+* `[TODO]`: Support Home Assistant through the [`notify` integration](https://www.home-assistant.io/integrations/notify/)
 * `[TODO]`: see if its possible to avoid `espeak-ng` as a system package dependency (python bindings, buliding the C library, etc)
 
 No work on the speech model itself is expected.
 
-### Home Assistant `notify` service
-
-configuration in `configuration.yaml` (or a `package/`):
-
-```yaml
-# Enable rest api
-api:
-
-notify:
-  - name: glados
-    platform: rest
-    resource: http://${GLADOS}/notify
-```
-
-This is roughly how it would work (not done yet).
-
 ## Install
 
 First you need to [install the `espeak-ng` system

diff --git a/engine_old.py b/engine_old.py
@@ -0,0 +1,122 @@
+import sys
+import os
+import time
+
+import torch
+import soundfile
+
+from glados_tts.utils.tools import prepare_text
+
+
+print("\033[1;94mINFO:\033[;97m Initializing TTS Engine...")
+
+# Select the device
+if torch.is_vulkan_available():
+    device = 'vulkan'
+if torch.cuda.is_available():
+    device = 'cuda'
+else:
+    device = 'cpu'
+
+# Load models
+if __name__ == "__main__":
+    glados = torch.jit.load('models/glados.pt')
+    vocoder = torch.jit.load('models/vocoder-gpu.pt', map_location=device)
+else:
+    glados = torch.jit.load('glados_tts/models/glados.pt')
+    vocoder = torch.jit.load('glados_tts/models/vocoder-gpu.pt', map_location=device)
+
+# Prepare models in RAM
+for i in range(4):
+    init = glados.generate_jit(prepare_text(str(i)))
+    init_mel = init['mel_post'].to(device)
+    init_vo = vocoder(init_mel)
+
+
+def glados_tts(text, key=False):
+
+    # Tokenize, clean and phonemize input text
+    x = prepare_text(text).to('cpu')
+
+    with torch.no_grad():
+
+        # Generate generic TTS-output
+        old_time = time.time()
+        tts_output = glados.generate_jit(x)
+
+        # Use HiFiGAN as vocoder to make output sound like GLaDOS
+        mel = tts_output['mel_post'].to(device)
+        audio = vocoder(mel)
+        print("\033[1;94mINFO:\033[;97m The audio sample took " +
+              str(round((time.time() - old_time) * 1000)) + " ms to generate.")
+
+        # Normalize audio to fit in wav-file
+        audio = audio.squeeze()
+        audio = audio * 32768.0
+        audio = audio.cpu().numpy().astype('int16')
+        if (key):
+            output_file = ('audio/GLaDOS-tts-temp-output-'+key+'.wav')
+        else:
+            output_file = ('audio/GLaDOS-tts-temp-output.wav')
+
+        # Write audio file to disk
+        # 22,05 kHz sample rate
+        soundfile.write(output_file, audio, 22050)
+
+    return True
+
+
+def main():
+    # Remote Engine Veritables
+    PORT = 8124
+    CACHE = True
+
+    from flask import Flask, request, send_file
+    import urllib.parse
+    import shutil
+
+    print("\033[1;94mINFO:\033[;97m Initializing TTS Server...")
+
+    app = Flask(__name__)
+
+    @app.route('/synthesize/', defaults={'text': ''})
+    @app.route('/synthesize/<path:text>')
+    def synthesize(text):
+        if (text == ''):
+            return 'No input'
+
+        line = urllib.parse.unquote(request.url[request.url.find('synthesize/')+11:])
+        filename = "GLaDOS-tts-"+line.replace(" ", "-")
+        filename = filename.replace("!", "")
+        filename = filename.replace("°c", "degrees celcius")
+        filename = filename.replace(",", "")+".wav"
+        file = os.getcwd()+'/audio/'+filename
+
+        # Check for Local Cache
+        if (os.path.isfile(file)):
+
+            # Update access time. This will allow for routine cleanups
+            os.utime(file, None)
+            print("\033[1;94mINFO:\033[;97m The audio sample sent from cache.")
+            return send_file(file)
+
+        # Generate New Sample
+        key = str(time.time())[7:]
+        if (glados_tts(line, key)):
+            tempfile = os.getcwd()+'/audio/GLaDOS-tts-temp-output-'+key+'.wav'
+
+            # If the line isn't too long, store in cache
+            if (len(line) < 200 and CACHE):
+                shutil.move(tempfile, file)
+            else:
+                return send_file(tempfile)
+                os.remove(tempfile)
+
+            return send_file(file)
+
+        else:
+            return 'TTS Engine Failed'
+
+    cli = sys.modules['flask.cli']
+    cli.show_server_banner = lambda *x: None
+    app.run(host="0.0.0.0", port=PORT)
diff --git a/glados_tts/config.py b/glados_tts/config.py
@@ -0,0 +1,10 @@
+# restapi
+
+port = 8124
+cache = True
+base_url = "/glados/tts"
+
+# tts
+
+audio_dir = "audio/"
+fname_prefix = "GLaDOS-tts"