Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The details on HuBERT-General-Audio #7

Open
vican9000 opened this issue Sep 18, 2024 · 1 comment
Open

The details on HuBERT-General-Audio #7

vican9000 opened this issue Sep 18, 2024 · 1 comment

Comments

@vican9000
Copy link

Hey, first of all, great work!

Two things bug me though:

  1. What's the semantic value of the HuBERT model you trained if it's using the first RVQ layer of the acoustic tokenizer? I.e. the acoustic model is already exposed to that.
  2. What was the sampling rate of the input audio for the semantic model? Is it the same for the acoustic model?
@zhenye234
Copy link
Owner

Thank you for your interest in our work.
1, Our training approach aligns with that of the HuBERT model, with a modification being the target of our acoustic unit discovery system. Instead of employing k-means clustering on MFCCs, we utilize the first VQ (vector quantization) layer of the codec.
2,16khz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants