This repository has been archived by the owner on Oct 13, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 42
Initialization of last layer to zero #161
Comments
Mm, on the master branch with transformer, this gives an OOM error. We need to have some code LFMmiLoss to conditionally prune the lattices more if they are too large. @csukuangfj can you point me to any code that does this? |
@danpovey snowfall/snowfall/decoding/lm_rescore.py Lines 262 to 281 in ed4c74a
It is from #147 |
That's a cool trick. Why does it work? |
M actually in snowfall, now that I test properly, it's not clear that it's working. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Guys,
I just remembered a trick that we used to use in Kaldi to help models converge early on, and I tried it on a setup
that was not converging great and it has a huge effect. I want to remind you of this (I don't have time to
try it on one of our standard setups just now).
It's just to set the last layer's parameters to zero.
The text was updated successfully, but these errors were encountered: