Classic attention baseline model #17

zbetmen1 · 2018-12-29T21:40:58Z

Hi Jianshu, must say I'm impressed with your work and both WAP, DenseNet and TAP papers and advances.
Have you done any experiments with CROHME using classic attention (Bahdanau style) and VGG style encoder (similar to the one used in WAP paper) and without 8 dimensional features?
I've tried to use this as baseline model, however this model is not able to generalize well and results in huge and quick overfitting. My intention is to inspect/study effects of each of the steps that are taken towards state of the art architecture.
I'm just checking with you since I speculate it might have been one of the steps you took when you began your work.

JianshuZhang · 2018-12-29T22:19:46Z

VGG encoder without 8 dimensional features also has results in WAP paper, but VGG with classic attention (Bahdanau) is quite difficult to train, unless you replace the VGG encoder into DenseNet encoder or choose to use the coverage attention. May be it is due to the training set it too small and VGG has less generalization ability than DenseNet.

…

在 2018年12月29日，下午9:40，Ognjen Kocić ***@***.***> 写道： Hi Jianshu, must say I'm impressed with your work and both WAP, DenseNet and TAP papers and advances. Have you done any experiments with CROHME using classic attention (Bahdanau style) and VGG style encoder (similar to the one used in WAP paper) and without 8 dimensional features? I've tried to use this as baseline model, however this model is not able to generalize well and results in huge and quick overfitting. My intention is to inspect/study effects of each of the steps that are taken towards state of the art architecture. I'm just checking with you since I speculate it might have been one of the steps you took when you began your work. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#17>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ASbU0RuUk3J7-n7kUFIptDbtdqhgO-QWks5u9-FrgaJpZM4ZkuHH>.

zbetmen1 · 2018-12-29T23:10:44Z

Ok, so I guess you have tried it also. In my experience, I could manage to overfit as much as I wanted, however validation loss stops declining after like 5 epochs (I did not even bother to check formula percentage with beam search on validation because of this huge overfitting). In general, I do not think that majority of people in ML/DL community knows how hard this problem really is (especially with this small dataset). If you do not mind, I'll ask you questions from time to time (similar to this one). Wish you all best for holidays.

…

On Sat, 29 Dec 2018, 23:19 Jianshu Zhang ***@***.*** wrote: VGG encoder without 8 dimensional features also has results in WAP paper, but VGG with classic attention (Bahdanau) is quite difficult to train, unless you replace the VGG encoder into DenseNet encoder or choose to use the coverage attention. May be it is due to the training set it too small and VGG has less generalization ability than DenseNet. > 在 2018年12月29日，下午9:40，Ognjen Kocić ***@***.***> 写道： > > Hi Jianshu, must say I'm impressed with your work and both WAP, DenseNet and TAP papers and advances. > Have you done any experiments with CROHME using classic attention (Bahdanau style) and VGG style encoder (similar to the one used in WAP paper) and without 8 dimensional features? > I've tried to use this as baseline model, however this model is not able to generalize well and results in huge and quick overfitting. My intention is to inspect/study effects of each of the steps that are taken towards state of the art architecture. > I'm just checking with you since I speculate it might have been one of the steps you took when you began your work. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub < #17>, or mute the thread < https://github.com/notifications/unsubscribe-auth/ASbU0RuUk3J7-n7kUFIptDbtdqhgO-QWks5u9-FrgaJpZM4ZkuHH >. > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#17 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFf4oVWb69wpyoryOkrtZcqgjO6Yr-9pks5u9-qCgaJpZM4ZkuHH> .

zbetmen1 · 2019-01-01T14:57:50Z

Hi Jianshu, its me again. Just one quick question to check if I've understood correctl. In your TAP work classic attention almost without modification (I've seen in some of your comments that second GRU used is just minor improvement) trains correctly and produces reasonable results (~42% expression rate)?
I get that traces carry much more information than offline images, however this sounds very interesting to me because RNNs are usually much harder to train than CNNs (this is related to encoders used in TAP/WAP). Do you have some insight that I'm not aware of regarding this? :)

JianshuZhang · 2019-01-01T20:23:16Z

RNN seems hard to be properly trained and that's why we use the weightnoise in training TAP. When you use the DenseNet CNN encoder, even if the the input of Densenet is offline images, its performance is also comparable with TAP. Overall, we need to make use of the complementary between online and offline input. And we need carefully pay attention to training the encoder well.

…

在 2019年1月1日，下午2:57，Ognjen Kocić ***@***.***> 写道： Hi Jianshu, its me again. Just one quick question to check if I've understood correctl. In your TAP work classic attention almost without modification (I've seen in some of your comments that second GRU used is just minor improvement) trains correctly and produces reasonable results (~42% expression rate)? I get that traces carry much more information than offline images, however this sounds very interesting to me because RNNs are usually much harder to train than CNNs (this is related to encoders used in TAP/WAP). Do you have some insight that I'm not aware of regarding this? :) — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classic attention baseline model #17

Classic attention baseline model #17

zbetmen1 commented Dec 29, 2018

JianshuZhang commented Dec 29, 2018 via email

zbetmen1 commented Dec 29, 2018 via email

zbetmen1 commented Jan 1, 2019

JianshuZhang commented Jan 1, 2019 via email

Classic attention baseline model #17

Classic attention baseline model #17

Comments

zbetmen1 commented Dec 29, 2018

JianshuZhang commented Dec 29, 2018 via email

zbetmen1 commented Dec 29, 2018 via email

zbetmen1 commented Jan 1, 2019

JianshuZhang commented Jan 1, 2019 via email