Model Training #2

prothej227 · 2022-08-08T08:43:25Z

Hi! Can you add detailed steps on how to train your model using a custom dataset?

danomatika · 2022-08-08T09:03:43Z

If you need more info that what is in the README, @bytosaur can answer but he is currently on vacation, so it may be a week or so until he can respond.

prothej227 · 2022-08-09T10:46:48Z

Hi, thanks for your reply! I'm planning to train your model using a custom dataset which is different from the common voice dataset provided in the documentation. Can you elaborate or give specific beginner-friendly steps on how I can retrain your model using my collated dataset?

bytosaur · 2022-08-14T14:31:49Z

hey @prothej227,

how does your dataset look like? Maybe it is not that different from my setup. You can always try the setup with an incomplete common voice dataset, i.e. two languages that have very few samples.

Collecting noise data is optional. The first step is to process the downloaded common voice folders into a structure that is understandable for the training script. There are a couple of tricks I did to clean the data (voice activity detection, debiasing through sampling) which are more advanced. However, in the end you want to have folders named by the class (language) containing mono samples of equal length, sampled at the same frequency, normalized, etc.. see this section.

Please let me know the sections of the README that are not understandable so I can improve them.

prothej227 · 2022-08-15T06:08:45Z

I have a dataset that contains wav files that vary in length (max = 5 seconds, min = 3 seconds).

danomatika added the documentation Improvements or additions to documentation label Aug 8, 2022

danomatika assigned bytosaur Aug 8, 2022

prothej227 closed this as completed Aug 15, 2022

prothej227 reopened this Aug 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Training #2

Model Training #2

prothej227 commented Aug 8, 2022

danomatika commented Aug 8, 2022

prothej227 commented Aug 9, 2022

bytosaur commented Aug 14, 2022

prothej227 commented Aug 15, 2022

Model Training #2

Model Training #2

Comments

prothej227 commented Aug 8, 2022

danomatika commented Aug 8, 2022

prothej227 commented Aug 9, 2022

bytosaur commented Aug 14, 2022

prothej227 commented Aug 15, 2022