-
-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement voicefixer for audio enhancement #221
Comments
Hi @thieugiactu, that's an interesting idea. To do this in a streaming way we would need access to a pre-trained model for the enhancement task, then implement a In order to make this compatible with |
I will give it a try. If I have any questions regarding to diart, can I directly ask them under this issue? |
@thieugiactu sure! Feel free to open a PR too, I'd be glad to discuss possible solutions to this |
This is what I've been doing so far. I re-used your code but replaced whisper model with wav2vec2 model for speech recognition since my pc couldn't handle whisper.
|
@thieugiactu something you could also do to reduce the inference time is to directly record audio at 44.1 khz. This way you avoid having to upsample in the first place |
@juanmc2005 thank you for your reply. Unfortunately the voicefixer is so unstable and I couldn't make it work properly. More often than not it would degrade the audio's quality even more. |
Is there any way to implement voicefixer to speaker diarization pipeline?
The package takes a wav file as input and gives a upsampled 44100kHz wav file as output, but that could be easily modified to taking and giving audio numpy array.
Since the speaker embeddings depend greatly on the quality of the input audio and in the real world environment, there are a lot of factor that can affect the quality of the audio such as the quality of the recording device, speaker voice change overtime,... so I think having some audio quality enhancement is a must.
The text was updated successfully, but these errors were encountered: