Skip to content
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.

Full-libri default true? #154

Open
danpovey opened this issue Apr 12, 2021 · 9 comments
Open

Full-libri default true? #154

danpovey opened this issue Apr 12, 2021 · 9 comments

Comments

@danpovey
Copy link
Contributor

I notice the --full-libri option defaults to true in mmi_att_transformer_train.py (specified in librispeech.py). This may be unexpected as it's not the previous behavior. Just pointing it out (we'll see what people think about it)..

@danpovey
Copy link
Contributor Author

BTW, Piotr, with some of your refactoring changes, after using the setup, I do have a general reservation about the scripts being a little too 'structured'. Meaning, I'd prefer if more of the configuration were in the bottom-level script so we could more easily change things independently; and try to rely on more general utiliities from packages outside of snowfall, rather than putting too much functionality into snowfall. But let's not stress about this too much right now.

I also tend to feel that your coding style is sometimes a little too interdependent, meaning you tend to create code where in order to understand it you have to understand quite a lot of other code (rather than having a fairly flat structure with many independent classes or functions needing to know about a relatively small set of common datatypes or abstractions). But I'm having a hard time finding a specific instance where I'd say "we should design this specific thing this other way". Anyway just keep it in mind as something to watch out for.

Incidentally, we spoke the other day about plans/goals/etc., and our plan is to have the official release of Icefall ready for the tutorial in Interspeech (approx. Sep 1st). Hopefully in a couple of months' time we can start designing Icefall; I don't want to pin anything down too precisely right now though (in terms of design).

@pzelasko
Copy link
Collaborator

Yeah, I changed full-libri to be true by default after you suggested that we move to it. I might have forgotten to announce it though.

About coding style/abstractions/code structure: honestly, I am not sure either what is the best way to organize snowfall/icefall. I think there are conflicting design goals of modularity/reusable abstractions/well-defined structure; and flexibility in changing things in an arbitrary manner to experiment with new ideas. We'll have to strike the right balance and I'm open to adjusting my coding habits if that helps to achieve it ;) anyway, I don't think there's another way to get there than trial-and-error.

I guess one of my main aims is to avoid creating recipes for new corpora by copying large chunks of code, but rather to be able to import a module (or a number of modules) and train the same setup on new data without too many adjustments. Maybe it would be helpful to have a distinction in the library between "stable" parts (things that you'd normally use when building a new system) and "experimental" parts (things that are promising/in-research but maybe a bit messy / leaking through the current abstractions).

Re Icefall: I was going to reach out to you and ask what's your plan :) let's talk in a couple of months about how to design it then.

@pzelasko
Copy link
Collaborator

FYI I checked how long does it make sense to train our conformer on full LibriSpeech (at least with the current settings) -- I ran it for 30 epochs; the non-averaged model keeps improving, but the averaged model results are stable after 20 epochs of training.

This should give some indication where we are w.r.t. SOTA; although I don't have a trained rescoring transformer LM which could improve further.

Without averaging:

Model at epoch 20 (no rescoring):

2021-05-21 14:57:34,776 INFO [common.py:380] [test-clean] %WER 4.48% [2353 / 52576, 289 ins, 193 del, 1871 sub ]
2021-05-21 14:58:24,262 INFO [common.py:380] [test-other] %WER 9.64% [5047 / 52343, 681 ins, 397 del, 3969 sub ]

Model at epoch 30 (no rescoring):

2021-05-22 14:41:38,359 INFO [common.py:380] [test-clean] %WER 4.36% [2292 / 52576, 292 ins, 179 del, 1821 sub ]
2021-05-22 14:43:30,392 INFO [common.py:380] [test-other] %WER 9.24% [4834 / 52343, 624 ins, 385 del, 3825 sub ]

With averaging:

Epoch 10 (rescoring 4-gram LM):

2021-05-20 12:54:51,879 INFO [common.py:380] [test-clean] %WER 4.07% [2140 / 52576, 354 ins, 123 del, 1663 sub ]
2021-05-20 13:02:43,636 INFO [common.py:380] [test-other] %WER 8.37% [4383 / 52343, 731 ins, 232 del, 3420 sub ]

Epoch 20 (no rescoring):

2021-05-21 09:34:55,569 INFO [common.py:380] [test-clean] %WER 4.33% [2274 / 52576, 268 ins, 183 del, 1823 sub ]
2021-05-21 09:35:43,453 INFO [common.py:380] [test-other] %WER 8.96% [4690 / 52343, 584 ins, 389 del, 3717 sub ]

Epoch 20 (rescoring 4-gram LM):

2021-05-21 09:46:26,814 INFO [common.py:380] [test-clean] %WER 3.87% [2036 / 52576, 334 ins, 116 del, 1586 sub ]
2021-05-21 09:53:26,347 INFO [common.py:380] [test-other] %WER 8.08% [4231 / 52343, 710 ins, 241 del, 3280 sub ]

Epoch 30 (no rescoring):

2021-05-22 14:45:39,709 INFO [common.py:380] [test-clean] %WER 4.31% [2267 / 52576, 293 ins, 182 del, 1792 sub ]
2021-05-22 14:46:36,179 INFO [common.py:380] [test-other] %WER 8.98% [4700 / 52343, 610 ins, 388 del, 3702 sub ]

Epoch 30 (rescoring 4-gram LM):

2021-05-22 14:53:36,527 INFO [common.py:380] [test-clean] %WER 3.86% [2030 / 52576, 345 ins, 114 del, 1571 sub ]
2021-05-22 15:00:10,075 INFO [common.py:380] [test-other] %WER 8.07% [4223 / 52343, 708 ins, 254 del, 3261 sub ]

@glynpu
Copy link
Contributor

glynpu commented May 23, 2021

This should give some indication where we are w.r.t. SOTA; although I don't have a trained rescoring transformer LM which could improve further.

I will share a trained rescoring lm in the following week. (Now models are ready and I am cleaning the code)

@glynpu
Copy link
Contributor

glynpu commented May 24, 2021

This should give some indication where we are w.r.t. SOTA; although I don't have a trained rescoring transformer LM which could improve further.

I will share a trained rescoring lm in the following week. (Now models are ready and I am cleaning the code)

@pzelasko Models and related codes currently locates here

@glynpu
Copy link
Contributor

glynpu commented May 29, 2021

FYI I checked how long does it make sense to train our conformer on full LibriSpeech (at least with the current settings) -- I ran it for 30 epochs; the non-averaged model keeps improving, but the averaged model results are stable after 20 epochs of training.

@pzelasko How many epochs do you chose to average at 20 epochs and 30 epochs?
I find that excluding several beginning models will decrease wer. Following are details result:

avg(4-29)
2021-05-29 11:45:22,937 INFO [common.py:380] [test-clean] %WER 3.61% [1899 / 52576, 328 ins, 101 del, 1470 sub ]
avg(4-30)
2021-05-29 11:53:21,700 INFO [common.py:380] [test-clean] %WER 3.59% [1887 / 52576, 331 ins, 101 del, 1455 sub ]
avg(1-30)
2021-05-29 12:08:18,920 INFO [common.py:380] [test-clean] %WER 3.70% [1943 / 52576, 350 ins, 107 del, 1486 sub ]  

@pzelasko
Copy link
Collaborator

pzelasko commented May 29, 2021

I used 5 for both. These differences are so small that I'm not convinced it has a real effect. The last row you have shown is no averaging, right?

BTW Your baseline numbers are a bit better than mine, is it rescoring with a transformer LM?

@glynpu
Copy link
Contributor

glynpu commented May 29, 2021

I used 5 for both. These differences are so small that I'm not convinced it has a real effect. The last row you have shown is no averaging, right?

No, the last row is averaging all 30 epochs, i.e. including the beginning models.
With excluding (1,2,3) epoch model, wer decrease from 3.70 (the last row, avg 1-30) to 3.59 (the scond last row avg 4-30).

BTW Your baseline numbers are a bit better than mine, is it rescoring with a transformer LM?

No, not transfromer LM. 4-gram lattice rescore is used.

with only averaging final 5 epochs, as you mentioned, the wer is:

avg 26-30
2021-05-29 19:38:58,381 INFO [common.py:380] [test-clean] %WER 3.69% [1938 / 52576, 386 ins, 96 del, 1456 sub ]

As for reasons of it's a bit better, the model structure is slightly modified refering to espnet:
a. change scaling when computing att-score, relating modification 1 and modification 2
b. add an extra layer_norm between encoder and ctc, relating code

These two differences between snowfall and espnet is identified by loading espnet trained model with snowfall, as introduced by this pr #201

@pzelasko
Copy link
Collaborator

pzelasko commented May 29, 2021

Ah, okay — thanks, that makes it clear for me. I didn’t think to try averaging with so many models, apparently it helps a bit.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants