-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Absolute fastest inference speed #574
Comments
From tutorials I've seen that adding a new preset with setting the
If nothing else can speed it up, I would be very glad about any help regarding lowering the samples |
What bout this? Some polish students have improved Tortoise speed by distilling the models into one and then distilling that one even more, but I can't find anything related to this. Could be interesting to contact them and try their distilled model, maybe? |
That's awesome, thank you so so much! I will try to contact them and hope to maybe achieve distillation myself, but I am not proficient enough to do so myself and would still love to have some easy hyperparameter tuning for speeding tortoise up even more :) |
Please update if you find a way to improve inference speed. Quality is great, but speed is definitely a problem |
If you want to improve inference speed a bit, I created a fork that allows for a slight speedup in exchange for using more memory. It probably won't bring you under 1 sec, but I've seen speeds of about 1.3 sec on ultra-fast (I do have a 4090 tho). You just need to use 'device_only=True' when creating the TTS object. |
Hey , did you check out the new api_fast ? |
@manmay-nakhashi I have tried it and got this error: ValueError: The following `model_kwargs` are not used by the model: ['cond_free_k', 'diffusion_temperature', 'diffusion_iterations'] (note: typos in the generate arguments will also show up in this list) Any suggestions? |
Don't pass diffusion related args as it's not using diffusion |
I am not passing those, but using the So I am using it like: from tortoise.api_fast import TextToSpeech
# Initialize the TextToSpeech object
tts = TextToSpeech(kv_cache=True, use_deepspeed=True, half=True)
# Create an in-memory buffer to hold the WAV file data
buffer = io.BytesIO()
# Initialize the WAV file
wf = wave.open(buffer, 'wb')
wf.setnchannels(1) # Mono
wf.setsampwidth(2) # 16-bit audio
wf.setframerate(24000) # Sample rate
for audio_frame in tts.tts_with_preset(
text_chunk,
voice_samples=voice_samples,
preset="ultra_fast",
):
if audio_frame is not None:
audio_np = audio_frame.cpu().detach().numpy()
audio_int16 = (audio_np * 32767).astype(np.int16)
wf.writeframes(audio_int16.tobytes())
else:
logging.warning("No audio generated for the text chunk.") |
no need to use presets over here as there are different configurations for speed vs. quality balance. |
Got it. Yeah switched to |
I'm still not able to get deepspeed to work.
These are the exact steps im following: conda create --name tortoise python=3.11
conda activate tortoise
conda install -c "nvidia/label/cuda-12.1.0" \
cuda cuda-toolkit cuda-compiler
git clone https://github.com/neonbjb/tortoise-tts.git
cd tortoise-tts
pip install -r requirements.txt
python setup.py install
python tortoise/do_tts.py \
--text "This is a test of the initial setup. This is only a test." \
--use_deepspeed true \
--voice random --preset fast |
I meet the same issue |
Run it without Deepspeed, deepspeed works well with cuda 11.8. cuda should be compiled with nvcc. |
Interesting to know using tts_stream, what sort of result did you get? Whats the process ratio? Thanks advance |
Hello! Thank you so much for all the great work.
I would really love to break free from the 11labs API, the only thing tortoise does not have is the sub 1 second inference speed.
With
deepspeed
,half precision
,kv_cache
,1 candidate
and one sentence prompt, the best I could get out of it is2.4 seconds
(without warmup,4 seconds
). The deepspeed promises a 10x speedup, is that relative to base performance or just not applicable for one sentence and is that result expected for a 3090?I would love to know how to further increase inference speed to sub 1 second performance, thank you :)
The text was updated successfully, but these errors were encountered: