Hi Jianshu! I'm reading your papers and code now. I have a question about the function _step_slice in nmt.py file. It seems that there are two GRU layers in this funciton and they output h1 and h2. In your DenseNet paper, there are two GRU layers in multi-scale attention model. So is h1 represents $\hat{s}t$ in Eq.(12) and h2 represents $s_t$ in Eq. (16? I'm confused about this because I have read that there are h1 and h2 in VGG model too but the WAP paper doesn't use two GRU layers. Waiting for your reply~ Thanks! ^^
Hi Jianshu! I'm reading your papers and code now. I have a question about the function _step_slice in nmt.py file. It seems that there are two GRU layers in this funciton and they output h1 and h2. In your DenseNet paper, there are two GRU layers in multi-scale attention model. So is h1 represents $\hat{s}t$ in Eq.(12) and h2 represents $s_t$ in Eq. (16? I'm confused about this because I have read that there are h1 and h2 in VGG model too but the WAP paper doesn't use two GRU layers. Waiting for your reply~ Thanks! ^^