-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classic attention baseline model #17
Comments
VGG encoder without 8 dimensional features also has results in WAP paper, but VGG with classic attention (Bahdanau) is quite difficult to train, unless you replace the VGG encoder into DenseNet encoder or choose to use the coverage attention. May be it is due to the training set it too small and VGG has less generalization ability than DenseNet.
… 在 2018年12月29日,下午9:40,Ognjen Kocić ***@***.***> 写道:
Hi Jianshu, must say I'm impressed with your work and both WAP, DenseNet and TAP papers and advances.
Have you done any experiments with CROHME using classic attention (Bahdanau style) and VGG style encoder (similar to the one used in WAP paper) and without 8 dimensional features?
I've tried to use this as baseline model, however this model is not able to generalize well and results in huge and quick overfitting. My intention is to inspect/study effects of each of the steps that are taken towards state of the art architecture.
I'm just checking with you since I speculate it might have been one of the steps you took when you began your work.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#17>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ASbU0RuUk3J7-n7kUFIptDbtdqhgO-QWks5u9-FrgaJpZM4ZkuHH>.
|
Ok, so I guess you have tried it also. In my experience, I could manage to
overfit as much as I wanted, however validation loss stops declining after
like 5 epochs (I did not even bother to check formula percentage with beam
search on validation because of this huge overfitting).
In general, I do not think that majority of people in ML/DL community knows
how hard this problem really is (especially with this small dataset). If
you do not mind, I'll ask you questions from time to time (similar to this
one).
Wish you all best for holidays.
…On Sat, 29 Dec 2018, 23:19 Jianshu Zhang ***@***.*** wrote:
VGG encoder without 8 dimensional features also has results in WAP paper,
but VGG with classic attention (Bahdanau) is quite difficult to train,
unless you replace the VGG encoder into DenseNet encoder or choose to use
the coverage attention. May be it is due to the training set it too small
and VGG has less generalization ability than DenseNet.
> 在 2018年12月29日,下午9:40,Ognjen Kocić ***@***.***> 写道:
>
> Hi Jianshu, must say I'm impressed with your work and both WAP, DenseNet
and TAP papers and advances.
> Have you done any experiments with CROHME using classic attention
(Bahdanau style) and VGG style encoder (similar to the one used in WAP
paper) and without 8 dimensional features?
> I've tried to use this as baseline model, however this model is not able
to generalize well and results in huge and quick overfitting. My intention
is to inspect/study effects of each of the steps that are taken towards
state of the art architecture.
> I'm just checking with you since I speculate it might have been one of
the steps you took when you began your work.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub <
#17>, or mute the thread <
https://github.com/notifications/unsubscribe-auth/ASbU0RuUk3J7-n7kUFIptDbtdqhgO-QWks5u9-FrgaJpZM4ZkuHH
>.
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFf4oVWb69wpyoryOkrtZcqgjO6Yr-9pks5u9-qCgaJpZM4ZkuHH>
.
|
Hi Jianshu, its me again. Just one quick question to check if I've understood correctl. In your TAP work classic attention almost without modification (I've seen in some of your comments that second GRU used is just minor improvement) trains correctly and produces reasonable results (~42% expression rate)? |
RNN seems hard to be properly trained and that's why we use the weightnoise in training TAP. When you use the DenseNet CNN encoder, even if the the input of Densenet is offline images, its performance is also comparable with TAP. Overall, we need to make use of the complementary between online and offline input. And we need carefully pay attention to training the encoder well.
… 在 2019年1月1日,下午2:57,Ognjen Kocić ***@***.***> 写道:
Hi Jianshu, its me again. Just one quick question to check if I've understood correctl. In your TAP work classic attention almost without modification (I've seen in some of your comments that second GRU used is just minor improvement) trains correctly and produces reasonable results (~42% expression rate)?
I get that traces carry much more information than offline images, however this sounds very interesting to me because RNNs are usually much harder to train than CNNs (this is related to encoders used in TAP/WAP). Do you have some insight that I'm not aware of regarding this? :)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hi Jianshu, must say I'm impressed with your work and both WAP, DenseNet and TAP papers and advances.
Have you done any experiments with CROHME using classic attention (Bahdanau style) and VGG style encoder (similar to the one used in WAP paper) and without 8 dimensional features?
I've tried to use this as baseline model, however this model is not able to generalize well and results in huge and quick overfitting. My intention is to inspect/study effects of each of the steps that are taken towards state of the art architecture.
I'm just checking with you since I speculate it might have been one of the steps you took when you began your work.
The text was updated successfully, but these errors were encountered: