Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it suitable for other languages? #2

Open
philpav opened this issue Feb 23, 2024 · 1 comment
Open

Is it suitable for other languages? #2

philpav opened this issue Feb 23, 2024 · 1 comment

Comments

@philpav
Copy link

philpav commented Feb 23, 2024

Hi there,

This sounds like a cool project that I'd like to use to anonymize people in videos while keeping the voice natural sounding.
Before I go through the hassle of installing everything, I wanted to ask if it would work with other languages such as German?

Thanks in advance!

@pchampio
Copy link
Member

pchampio commented Feb 23, 2024

Hello,
Thanks for your interest.
I have many answer to make here:
Let me start by promoting the huggingface spcace that allows you to test the models that I share without installation.
Link here: https://huggingface.co/spaces/Champion/SA-toolkit
No audio files will be save by me, execution is in the huggingface cloud, but I don't know if they store the audio data (very unlikely IMO).
You can also use docker:

docker run -it -p 7860:7860 --platform=linux/amd64 registry.hf.space/champion-sa-toolkit:latest python app.py

but those are CPU only options.
With those, you can check the generated audio by yourself very easily.

Then about the natural sounding voice, there are some tricks that I didn't implemented in the provided model that will make the model generate more natural sounding voice (I did those for my Thesis, but this toolkit is an re-implementation).
For the sake of documentation, here are most of them: - f0 stat speaker norm; F0 quant; ASR-bn extraction in subsamples during hifigan training to increase data (cache_functions = ["none"] or get_f0 only in hifigan conf).
So the outputted speech is not the most natural sounding, I will let you be the judge.

As the model is trained on English, it performs the best in this language, however, it can work with other languages too, but the linguistic content (what someone says) will get deteriorated, especially if the source audio quality is already bad.

About the model, the stronger they are at anonymizing the speaker, the more they are specific to the English language.
German is not that far away from English, it could work.

On previous version of the toolkit, we trained an anonymization model for french using MLS (http://openslr.org/94/) see some old-not-working code here: https://github.com/deep-privacy/SA-toolkit/tree/master/egs/asr/mls MLS has a German section that could be adapted with the toolkit (substantial amount of work).

Given the list here: https://huggingface.co/spaces/Champion/SA-toolkit, here are some comment:

  • 'hifigan_bn_tdnnf_wav2vec2_vq_48_v1': The best for privacy (harder to invert anonymization), good for clean speech (close mic) and English.
  • 'hifigan_bn_tdnnf_wav2vec2_100h_aug_v1': The best for natural speech generation, not a lot of privacy guarantee (easy to invert anonymization).
  • The others are more for research purposes, but 'hifigan_bn_tdnnf_600h_aug_v1' can be interesting (similar to 'hifigan_bn_tdnnf_wav2vec2_100h_aug_v1')

Depending on the threat level that you are considering, a weak anonymization could be enough, otherwise, if you threat level is very high, 'hifigan_bn_tdnnf_wav2vec2_vq_48_v1' is the best model that I can provide. (number that does not mean anything: BIG estimate ~~~70% anonymization, 100% being the best anonymization but not achievable with this toolkit, and by other toolkit too without significant lost of utility)!
Research is still active in the domain. If you are interested in the domain/(real)-metrics checkout my thesis.
Best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants