Is it suitable for other languages? #2

philpav · 2024-02-23T08:28:47Z

Hi there,

This sounds like a cool project that I'd like to use to anonymize people in videos while keeping the voice natural sounding.
Before I go through the hassle of installing everything, I wanted to ask if it would work with other languages such as German?

Thanks in advance!

pchampio · 2024-02-23T13:23:38Z

Hello,
Thanks for your interest.
I have many answer to make here:
Let me start by promoting the huggingface spcace that allows you to test the models that I share without installation.
Link here: https://huggingface.co/spaces/Champion/SA-toolkit
No audio files will be save by me, execution is in the huggingface cloud, but I don't know if they store the audio data (very unlikely IMO).
You can also use docker:

docker run -it -p 7860:7860 --platform=linux/amd64 registry.hf.space/champion-sa-toolkit:latest python app.py

but those are CPU only options.
With those, you can check the generated audio by yourself very easily.

Then about the natural sounding voice, there are some tricks that I didn't implemented in the provided model that will make the model generate more natural sounding voice (I did those for my Thesis, but this toolkit is an re-implementation).
For the sake of documentation, here are most of them: - f0 stat speaker norm; F0 quant; ASR-bn extraction in subsamples during hifigan training to increase data (cache_functions = ["none"] or get_f0 only in hifigan conf).
So the outputted speech is not the most natural sounding, I will let you be the judge.

As the model is trained on English, it performs the best in this language, however, it can work with other languages too, but the linguistic content (what someone says) will get deteriorated, especially if the source audio quality is already bad.

About the model, the stronger they are at anonymizing the speaker, the more they are specific to the English language.
German is not that far away from English, it could work.

On previous version of the toolkit, we trained an anonymization model for french using MLS (http://openslr.org/94/) see some old-not-working code here: https://github.com/deep-privacy/SA-toolkit/tree/master/egs/asr/mls MLS has a German section that could be adapted with the toolkit (substantial amount of work).

Given the list here: https://huggingface.co/spaces/Champion/SA-toolkit, here are some comment:

'hifigan_bn_tdnnf_wav2vec2_vq_48_v1': The best for privacy (harder to invert anonymization), good for clean speech (close mic) and English.
'hifigan_bn_tdnnf_wav2vec2_100h_aug_v1': The best for natural speech generation, not a lot of privacy guarantee (easy to invert anonymization).
The others are more for research purposes, but 'hifigan_bn_tdnnf_600h_aug_v1' can be interesting (similar to 'hifigan_bn_tdnnf_wav2vec2_100h_aug_v1')

Depending on the threat level that you are considering, a weak anonymization could be enough, otherwise, if you threat level is very high, 'hifigan_bn_tdnnf_wav2vec2_vq_48_v1' is the best model that I can provide. (number that does not mean anything: BIG estimate ~~~70% anonymization, 100% being the best anonymization but not achievable with this toolkit, and by other toolkit too without significant lost of utility)!
Research is still active in the domain. If you are interested in the domain/(real)-metrics checkout my thesis.
Best.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it suitable for other languages? #2

Is it suitable for other languages? #2

philpav commented Feb 23, 2024

pchampio commented Feb 23, 2024 •

edited

Loading

Is it suitable for other languages? #2

Is it suitable for other languages? #2

Comments

philpav commented Feb 23, 2024

pchampio commented Feb 23, 2024 • edited Loading

pchampio commented Feb 23, 2024 •

edited

Loading