Improving Performance on Shorter Audio Clips #5

shawnbzhang · 2020-09-06T23:40:13Z

Using your GPVAD/VADC, I wish to process smaller chunks (i.e. ~200ms chunks) of audio files. However, when the duration is this low, the performance of the VAD is poor. What can I do to better the performance? I assume this must be done in the training side. Would you recommend downloading the datasets and splicing them into these smaller chunks, retraining from scratch?

Curious to hear your thoughts. Thank you!

RicherMans · 2020-09-07T02:56:28Z

Hey there,
well so far the proposed GPV is not "online" meaning that it does not directly output for each frame one probability.
Performance is dependent on the utterance length, due to the bidirectional GRU getting more information.

What can I do to better the performance? I assume this must be done in the training side. Would you recommend downloading the datasets and splicing them into these smaller chunks, retraining from scratch?

Well, the point of the entire project is just to show that VAD can be trained on clip-level using weak (here inexact and noisy) supervision.
If you have labels for every,e.g., 200ms, well just train a standard VAD model.
However, in reality, I doubt that you have this type of supervision available, its too costly.

However, when the duration is this low, the performance of the VAD is poor.

Well, how about during testing you just splice some short utterances together?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Performance on Shorter Audio Clips #5

Improving Performance on Shorter Audio Clips #5

shawnbzhang commented Sep 6, 2020 •

edited

Loading

RicherMans commented Sep 7, 2020

Improving Performance on Shorter Audio Clips #5

Improving Performance on Shorter Audio Clips #5

Comments

shawnbzhang commented Sep 6, 2020 • edited Loading

RicherMans commented Sep 7, 2020

shawnbzhang commented Sep 6, 2020 •

edited

Loading