This simple example cli tool can be used to generate speach from a text prompt and save it localy to a .wav audio file.
- CLI and library must be built
- A local GGUF file for parler tts mini
In order to get a detailed breakdown the functionality currently available you can call the cli with the --help
parameter. This will return a breakdown of all parameters:
./cli --help
--temperature (-t):
The temperature to use when generating outputs. Defaults to 0.9.
--repetition-penalty (-r):
The by channel repetition penalty to be applied the sampled output of the model. defaults to 1.1.
--n-threads (-nt):
The number of cpu threads to run generation with. Defaults to 10.
--topk (-tk):
(OPTIONAL) when set to an integer value greater than 0 generation uses nucleus sampling over topk nucleaus size. Defaults to 50.
--use-metal (-m):
(OPTIONAL) Whether to use metal acceleration
--no-cross-attn (-ca):
(OPTIONAL) Whether to not include cross attention
--model-path (-mp):
(REQUIRED) The local path of the gguf model file for Parler TTS mini or large v1.
--prompt (-p):
(REQUIRED) The text prompt for which to generate audio in quotation markers.
--save-path (-sp):
(REQUIRED) The path to save the audio output to in a .wav format.
--conditional-prompt (-cp):
(OPTIONAL) A distinct conditional prompt to use for generating. If none is provided the preencoded prompt is used. '--text-encoder-path' must be set to use conditional generation.
--text-encoder-path (-tep):
(OPTIONAL) The local path of the text encoder gguf model for conditional generaiton.
General usage should follow from these possible parameters. E.G. The following command will save generated speech to the /tmp/test.wav
file.
./cli --model-path /model/path/to/gguf_file.gguf --prompt "I am saying some words" --save-path /tmp/test.wav
By default the Parler TTS model is saved to the GGUF format with a pre-encoded conditional prompt (i.e. a prompt used to determine how to generate speech), but if the text encoder model, the T5-Encoder model, is avaiable in gguf format (see the python convertion scripts for more information on how to prepare the T5-Encoder model) then a new conditional prompt can be used for generation like so:
./cli --model-path /model/path/to/gguf_file.gguf --prompt "I am saying some words" --save-path /tmp/test.wav --text-encoder-path /model/path/to/t5_encoder_file.gguf --consditional-prompt "deep voice"