Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: preallocate tensor in semantic text generation #366

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

no2chem
Copy link
Contributor

@no2chem no2chem commented Jun 21, 2023

This PR modifies the generate_text_semantic so that it preallocates a tensor and fills it instead of using cat, which results in extra allocations and overhead. It also removes the del line and lets the garbage collector manage things, hopefully async.

Overall, on a H100, with some of the other patches, this lets me generate the example prompt:

"Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe. "

Slightly faster than realtime. On average, this prompt generates a 12-13s audio clip, and I can generate the clip in around 8-12s. On a good run that's approximately 130% realtime.

I'd note that the performance of the semantic model seems to be especially bimodal - sometimes I get lucky and get > 270it/s, which takes 2s, and other times its slow and does ~150it/s and takes 4s. It'd be nice to eliminate this variance, though I wonder what if it has to do with the model.

@no2chem no2chem force-pushed the preallocateTensor branch from e523c2e to 5456c7f Compare June 21, 2023 18:01
@Ph0rk0z
Copy link

Ph0rk0z commented Jul 5, 2023

I merged all these speedups and they seem to have helped. Bark still only uses 30% of my gpu. There has to be some kind of bottleneck or something. I get 60/70 It/s on the first part with 3090.. but the 2nd part where I think it runs coarse and fine it still only does 1.xx it/s

@Mradr
Copy link

Mradr commented Jul 10, 2023

Using the small model I can reach up to 140-150 - but 270 on the large seems crazy o.o I dont seem to get anywhere near that even with some with this PR vs an H100 using a 3090 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants