You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In your readme you say that Hubert is useful for non english language, but Hubert model used in the script, trained only on English audio. Do you mean I need to replace hubert model with trained on my language(which doesn't exist, so I need to train it myself)? Also why do you use finetuned for ASR model instead of just pretrained.
The text was updated successfully, but these errors were encountered:
In our test, the English-pretrained HuBERT is general for most non-English cases like Chinese, French, and German, so you don't need to do anything extra but just apply it if you can't find a more suitable model. The model fine-tuned for ASR is better at phoneme alignment, which is an important factor for talking head, while the raw pre-trained model performs worse at this aspect.
In your readme you say that Hubert is useful for non english language, but Hubert model used in the script, trained only on English audio. Do you mean I need to replace hubert model with trained on my language(which doesn't exist, so I need to train it myself)? Also why do you use finetuned for ASR model instead of just pretrained.
The text was updated successfully, but these errors were encountered: