-
Notifications
You must be signed in to change notification settings - Fork 42
Full-libri default true? #154
Comments
BTW, Piotr, with some of your refactoring changes, after using the setup, I do have a general reservation about the scripts being a little too 'structured'. Meaning, I'd prefer if more of the configuration were in the bottom-level script so we could more easily change things independently; and try to rely on more general utiliities from packages outside of snowfall, rather than putting too much functionality into snowfall. But let's not stress about this too much right now. I also tend to feel that your coding style is sometimes a little too interdependent, meaning you tend to create code where in order to understand it you have to understand quite a lot of other code (rather than having a fairly flat structure with many independent classes or functions needing to know about a relatively small set of common datatypes or abstractions). But I'm having a hard time finding a specific instance where I'd say "we should design this specific thing this other way". Anyway just keep it in mind as something to watch out for. Incidentally, we spoke the other day about plans/goals/etc., and our plan is to have the official release of Icefall ready for the tutorial in Interspeech (approx. Sep 1st). Hopefully in a couple of months' time we can start designing Icefall; I don't want to pin anything down too precisely right now though (in terms of design). |
Yeah, I changed full-libri to be true by default after you suggested that we move to it. I might have forgotten to announce it though. About coding style/abstractions/code structure: honestly, I am not sure either what is the best way to organize snowfall/icefall. I think there are conflicting design goals of modularity/reusable abstractions/well-defined structure; and flexibility in changing things in an arbitrary manner to experiment with new ideas. We'll have to strike the right balance and I'm open to adjusting my coding habits if that helps to achieve it ;) anyway, I don't think there's another way to get there than trial-and-error. I guess one of my main aims is to avoid creating recipes for new corpora by copying large chunks of code, but rather to be able to import a module (or a number of modules) and train the same setup on new data without too many adjustments. Maybe it would be helpful to have a distinction in the library between "stable" parts (things that you'd normally use when building a new system) and "experimental" parts (things that are promising/in-research but maybe a bit messy / leaking through the current abstractions). Re Icefall: I was going to reach out to you and ask what's your plan :) let's talk in a couple of months about how to design it then. |
FYI I checked how long does it make sense to train our conformer on full LibriSpeech (at least with the current settings) -- I ran it for 30 epochs; the non-averaged model keeps improving, but the averaged model results are stable after 20 epochs of training. This should give some indication where we are w.r.t. SOTA; although I don't have a trained rescoring transformer LM which could improve further. Without averaging:Model at epoch 20 (no rescoring):
Model at epoch 30 (no rescoring):
With averaging:Epoch 10 (rescoring 4-gram LM):
Epoch 20 (no rescoring):
Epoch 20 (rescoring 4-gram LM):
Epoch 30 (no rescoring):
Epoch 30 (rescoring 4-gram LM):
|
I will share a trained rescoring lm in the following week. (Now models are ready and I am cleaning the code) |
@pzelasko Models and related codes currently locates here |
@pzelasko How many epochs do you chose to average at 20 epochs and 30 epochs?
|
I used 5 for both. These differences are so small that I'm not convinced it has a real effect. The last row you have shown is no averaging, right? BTW Your baseline numbers are a bit better than mine, is it rescoring with a transformer LM? |
No, the last row is averaging all 30 epochs, i.e. including the beginning models.
No, not transfromer LM. 4-gram lattice rescore is used. with only averaging final 5 epochs, as you mentioned, the wer is:
As for reasons of it's a bit better, the model structure is slightly modified refering to espnet: These two differences between snowfall and espnet is identified by loading espnet trained model with snowfall, as introduced by this pr #201 |
Ah, okay — thanks, that makes it clear for me. I didn’t think to try averaging with so many models, apparently it helps a bit. |
I notice the --full-libri option defaults to true in mmi_att_transformer_train.py (specified in librispeech.py). This may be unexpected as it's not the previous behavior. Just pointing it out (we'll see what people think about it)..
The text was updated successfully, but these errors were encountered: