|
| 1 | +[](https://github.com/diffusion-studio/ffmpeg-js/graphs/commit-activity) |
| 2 | +[](https://diffusion.studio) |
| 3 | +[](https://discord.gg/n3mpzfejAb) |
| 4 | +[](https://github.com/diffusion-studio/ffmpeg-js/blob/main/LICENSE) |
| 5 | +[](https://typescriptlang.org) |
| 6 | + |
| 7 | +# Use VITS models in the browser powered by the [ONNX Runtime](https://onnxruntime.ai/) |
| 8 | + |
| 9 | +A big shout-out goes to [Rhasspy Piper](https://github.com/rhasspy/piper), who open-sourced all the currently available models (MIT License) and to [@jozefchutka](https://github.com/jozefchutka) who came up with the wasm build steps. |
| 10 | + |
| 11 | +## Usage |
| 12 | +First of all, you need to install the library: |
| 13 | +```bash |
| 14 | +npm i --save @diffusionstudio/vits-web |
| 15 | +``` |
| 16 | + |
| 17 | +Then you're able to import the library like this (ES only) |
| 18 | +```typescript |
| 19 | +import * as tts from '@diffusionstudio/vits-web'; |
| 20 | + |
| 21 | +// Hint: onnxruntime-web is a peer dependency |
| 22 | +``` |
| 23 | + |
| 24 | +Now you can start synthesizing speech! |
| 25 | +```typescript |
| 26 | +const wav = await tts.predict({ |
| 27 | + text: "Text to speech in the browser is amazing!", |
| 28 | + voiceId: 'en_US-hfc_female-medium', |
| 29 | +}); |
| 30 | + |
| 31 | +// available in Web Worker |
| 32 | + |
| 33 | +const audio = new Audio(); |
| 34 | +audio.src = URL.createObjectURL(wav); |
| 35 | +audio.play(); |
| 36 | +``` |
| 37 | + |
| 38 | +With the initial run of the predict function you will download the model which will then be stored in your [Origin private file system](https://developer.mozilla.org/en-US/docs/Web/API/File_System_API/Origin_private_file_system). You can also do this manually in advance *(recommended)*, as follows: |
| 39 | +```typescript |
| 40 | +await tts.download('en_US-hfc_female-medium', (progress) => { |
| 41 | + console.log(`Downloading ${progress.url} - ${Math.round(progress.loaded * 100 / progress.total)}%`); |
| 42 | +}); |
| 43 | +``` |
| 44 | + |
| 45 | +The predict function also accepts a download progress callback as the second argument (`tts.predict(..., console.log)`). <br> |
| 46 | + |
| 47 | +If you want to know which models have already been stored, do the following |
| 48 | +```typescript |
| 49 | +console.log(await tts.stored()); |
| 50 | + |
| 51 | +// will log ['en_US-hfc_female-medium'] |
| 52 | +``` |
| 53 | + |
| 54 | +You can remove models from opfs by calling |
| 55 | +```typescript |
| 56 | +await tts.remove('en_US-hfc_female-medium'); |
| 57 | + |
| 58 | +// alternatively delete all |
| 59 | + |
| 60 | +await tts.flush(); |
| 61 | +``` |
| 62 | + |
| 63 | +And last but not least use this snippet if you would like to retrieve all available voices: |
| 64 | +```typescript |
| 65 | +console.log(await tts.voices()); |
| 66 | + |
| 67 | +// Hint: the key can be used as voiceId |
| 68 | +``` |
| 69 | + |
| 70 | +### **That's it!** Happy coding :) |
0 commit comments