-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the _step_slice function in nmt.py file #10
Comments
Yes, correct. |
Thanks! |
Hi Jianshu! Recently I have accomplished the multi-scale attention branch based on your code. I have trained the model using my GPU and the time cost of training greatly raises after adding the multi-scale attention branch (without multi-scale branch a epoch training spends 5min and a epoch training spends 15min after adding the branch). I'm not sure whether my codes are written correctly. Do you spend much more time on your training after adding the multi-scale branch? Besides, should I need to change the hyper-parameters such as learning rate, patience(your code sets it to 15) and optimizer after adding the branch since my code seems to converge slower than your original code. Waiting for your reply~ Thanks! |
Yes, time cost raises much, that's why I didn't use higher resolution features, although it brings pleasant performance improvement. Guess you used Ti GPU? I used Telsa K40 GPU, 10 min one epoch for single branch, 16 min one epoch for two branches, while your time cost increases nearly three times, that's too much compared with mine, may be need to be checked.
I didn't change learning rate and other hyper-parameters, only multi-scale branch CNN parameters are added.
[email protected]
From: Jomensy
Date: 2018-09-17 20:02
To: JianshuZhang/WAP
CC: Jianshu Zhang; Comment
Subject: Re: [JianshuZhang/WAP] Question about the _step_slice function in nmt.py file (#10)
Hi Jianshu! Recently I have accomplished the multi-scale attention branch based on your code. I have trained the model using my GPU and the time cost of training greatly raises after adding the multi-scale attention branch (without multi-scale branch a epoch training spends 5min and a epoch training spends 15min after adding the branch). I'm not sure whether my codes are written correctly. Do you spend much more time on your training after adding the multi-scale branch? Besides, should I need to change the hyper-parameters such as learning rate, patience(your code sets it to 15) and optimizer after adding the branch since my code seems to converge slower than your original code. Waiting for your reply~ Thanks!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thanks for your reply! |
Hi Jianshu! I'm reading your papers and code now. I have a question about the function _step_slice in nmt.py file. It seems that there are two GRU layers in this funciton and they output h1 and h2. In your DenseNet paper, there are two GRU layers in multi-scale attention model. So is h1 represents $\hat{s}t$ in Eq.(12) and h2 represents $s_t$ in Eq. (16? I'm confused about this because I have read that there are h1 and h2 in VGG model too but the WAP paper doesn't use two GRU layers. Waiting for your reply~ Thanks! ^^
The text was updated successfully, but these errors were encountered: