-
Notifications
You must be signed in to change notification settings - Fork 42
Label smoothing for LF-MMI #179
Comments
Will try this. |
In
in snowfall/snowfall/objectives/mmi.py Line 94 in c5ffa3f
The results are:
There is no clear improvement by now. |
If the shape of nnet_output is (N, T, C), should it be -nnet_output.mean(2).sum() ? Also, the denominator |
Yes, the shape is (N, T, C). I think |
Hm.
Try 0.1, just to verify that it gets worse. If it does, we'll forget this.
The 18.5 is still too close to the margin of error.
And how much is the printed objective function affected by this?
…On Fri, Apr 30, 2021 at 10:29 AM Han Zhu ***@***.***> wrote:
Yes, the shape is (N, T, C). I think mean(2) and mean(-1) is the same
here. I added this smooth_score to the tot_score. So it will be
normalized by (len(texts) * accum_grad) together with original tot_score.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#179 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO53A7VJB35ZITXG4TDTLIIZRANCNFSM43WI3RSA>
.
|
Will try it.
With smooth scale 0.01, the validation objf is actually better. As for the loss value, with smooth scale 0.01, the weighted smooth loss is about 30% ~ 150% of the original mmi loss (i.e. the total loss - weighted smooth loss). |
So does the validation loss include that extra term, or you disable it for
validation?
…On Fri, Apr 30, 2021 at 2:29 PM Han Zhu ***@***.***> wrote:
Will try it.
If we print the original mmi loss (i.e. the total loss - weighted smooth
loss) like before as the objf, the validation average objf is not affected
much. Take the last validation result of training for example:
smooth scale validation average objf
0 0.204217
0.01 0.199484
0.001 0.217441
0.0001 0.20684
0.00001 0.206871
With smooth scale 0.01, the validation objf is actually better.
As for the loss value, with smooth scale 0.01, the weighted smooth loss is
about 30% ~ 150% of the original mmi loss (i.e. the total loss - weighted
smooth loss).
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#179 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOZLCVRGAQK2XFZDGATTLJE4JANCNFSM43WI3RSA>
.
|
I disabled it for validation. |
There are results with smooth scale 1 and 0.1.
The results are clearly worse than the one without smooth loss. |
Since we have an ali model then maybe another option is to add frame wise cross entropy loss using that alignment, and apply the label smoothing there? |
We previously tried regularizing with a cross-entropy loss based on the
alignments of the currently-being-trained model, but we didn't see any
improvements.
Could try it again, of course.
It's possible that the issue is, we are using relatively small models and
the limiting factor is learning, not generalization.
…On Sat, May 1, 2021 at 9:47 AM Piotr Żelasko ***@***.***> wrote:
Since we have an ali model then maybe another option is to add frame wise
cross entropy loss using that alignment, and apply the label smoothing
there?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#179 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOYVC55A43RO2O72FCLTLNMR3ANCNFSM43WI3RSA>
.
|
@zhu-han do you think you could read about "iterated loss" here |
I'll try it. |
thanks!
…On Sat, May 1, 2021 at 6:04 PM Han Zhu ***@***.***> wrote:
I'll try it.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#179 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO6HJSSKP23HWQ5HBHDTLPGZ3ANCNFSM43WI3RSA>
.
|
Perhaps @zhu-han can try this..
we should be able to implement label smoothing for our LF-MMI system by adding some small constant times
-nnet_output.mean(1).sum() / (len(texts) * accum_grad)
to the loss function [assuming we are still normalizingby len(texts) which IMO is not as optimal as normalizing by total num-frames, but that's a separate issue.]
The text was updated successfully, but these errors were encountered: