Question about the _step_slice function in nmt.py file #10

Jomensy · 2018-07-04T07:46:55Z

Hi Jianshu! I'm reading your papers and code now. I have a question about the function _step_slice in nmt.py file. It seems that there are two GRU layers in this funciton and they output h1 and h2. In your DenseNet paper, there are two GRU layers in multi-scale attention model. So is h1 represents $\hat{s}t$ in Eq.(12) and h2 represents $s_t$ in Eq. (16? I'm confused about this because I have read that there are h1 and h2 in VGG model too but the WAP paper doesn't use two GRU layers. Waiting for your reply~ Thanks! ^^

JianshuZhang · 2018-07-04T08:27:07Z

Yes, correct.
When we sumbitted our WAP paper, our decoder only had one GRU layer. It was two years ago and we didn't release our WAP code at that time. For now, we first release the DenseNet code and the VGG code only changes the encoder architecture, so the decoder in VGG code also has two GRU layers. But the difference is trival, little effect on the performance (may be only for CROHME dataset, it is too small).

Jomensy · 2018-07-04T08:59:22Z

Thanks!

Jomensy · 2018-09-17T12:02:51Z

Hi Jianshu! Recently I have accomplished the multi-scale attention branch based on your code. I have trained the model using my GPU and the time cost of training greatly raises after adding the multi-scale attention branch (without multi-scale branch a epoch training spends 5min and a epoch training spends 15min after adding the branch). I'm not sure whether my codes are written correctly. Do you spend much more time on your training after adding the multi-scale branch? Besides, should I need to change the hyper-parameters such as learning rate, patience(your code sets it to 15) and optimizer after adding the branch since my code seems to converge slower than your original code. Waiting for your reply~ Thanks!

JianshuZhang · 2018-09-17T12:27:46Z

Yes, time cost raises much, that's why I didn't use higher resolution features, although it brings pleasant performance improvement. Guess you used Ti GPU? I used Telsa K40 GPU, 10 min one epoch for single branch, 16 min one epoch for two branches, while your time cost increases nearly three times, that's too much compared with mine, may be need to be checked. I didn't change learning rate and other hyper-parameters, only multi-scale branch CNN parameters are added. [email protected] From: Jomensy Date: 2018-09-17 20:02 To: JianshuZhang/WAP CC: Jianshu Zhang; Comment Subject: Re: [JianshuZhang/WAP] Question about the _step_slice function in nmt.py file (#10) Hi Jianshu! Recently I have accomplished the multi-scale attention branch based on your code. I have trained the model using my GPU and the time cost of training greatly raises after adding the multi-scale attention branch (without multi-scale branch a epoch training spends 5min and a epoch training spends 15min after adding the branch). I'm not sure whether my codes are written correctly. Do you spend much more time on your training after adding the multi-scale branch? Besides, should I need to change the hyper-parameters such as learning rate, patience(your code sets it to 15) and optimizer after adding the branch since my code seems to converge slower than your original code. Waiting for your reply~ Thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Jomensy · 2018-09-17T12:46:42Z

Yes, time cost raises much, that's why I didn't use higher resolution features, although it brings pleasant performance improvement. Guess you used Ti GPU? I used Telsa K40 GPU, 10 min one epoch for single branch, 16 min one epoch for two branches, while your time cost increases nearly three times, that's too much compared with mine, may be need to be checked. I didn't change learning rate and other hyper-parameters, only multi-scale branch CNN parameters are added. [email protected] From: Jomensy Date: 2018-09-17 20:02 To: JianshuZhang/WAP CC: Jianshu Zhang; Comment Subject: Re: [JianshuZhang/WAP] Question about the _step_slice function in nmt.py file (#10) Hi Jianshu! Recently I have accomplished the multi-scale attention branch based on your code. I have trained the model using my GPU and the time cost of training greatly raises after adding the multi-scale attention branch (without multi-scale branch a epoch training spends 5min and a epoch training spends 15min after adding the branch). I'm not sure whether my codes are written correctly. Do you spend much more time on your training after adding the multi-scale branch? Besides, should I need to change the hyper-parameters such as learning rate, patience(your code sets it to 15) and optimizer after adding the branch since my code seems to converge slower than your original code. Waiting for your reply~ Thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Thanks for your reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the _step_slice function in nmt.py file #10

Question about the _step_slice function in nmt.py file #10

Jomensy commented Jul 4, 2018 •

edited

Loading

JianshuZhang commented Jul 4, 2018

Jomensy commented Jul 4, 2018

Jomensy commented Sep 17, 2018

JianshuZhang commented Sep 17, 2018 via email

Jomensy commented Sep 17, 2018

Question about the _step_slice function in nmt.py file #10

Question about the _step_slice function in nmt.py file #10

Comments

Jomensy commented Jul 4, 2018 • edited Loading

JianshuZhang commented Jul 4, 2018

Jomensy commented Jul 4, 2018

Jomensy commented Sep 17, 2018

JianshuZhang commented Sep 17, 2018 via email

Jomensy commented Sep 17, 2018

Jomensy commented Jul 4, 2018 •

edited

Loading