-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batched #15
base: master
Are you sure you want to change the base?
Batched #15
Conversation
Hi @soumith 😮 Thanks for taking the time to send this PR. Unfortunately, after your previous PR, I kind of went back and looked at the model and realised I did many things in a dumb way. Since this was the first model I had implemented in order to learn PyTorch, there was a bunch of unnecessary stuff, like As of the latest commit, since the model file has changed, I cannot directly merge this PR, and I am quite inexperienced with rebasing and resolving conflicts. If and when you have the time, could you take a look at the current |
# FC for i, f, u, o gates (N, 4*C), from input to hidden | ||
i2h = F.linear(inputs, i2h_weight, i2h_bias) | ||
i2h_slices = torch.split(i2h, i2h.size(1) // 4, dim=1) # (N, C)*4 | ||
i2h_iuo = torch.cat([i2h_slices[0], i2h_slices[2], i2h_slices[3]], dim=1) # (N, C*3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the indices 0,2,3 and not 0,1,2? Why is i2h_f_slice = i2h_slices[1]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also why is it iuo
and not iou
? Is there some rationale behind this? I am using iou
so wondering if I am making some mistake...
This brings the rewritten, more efficient model from https://github.com/zackchase/mxnet-the-straight-dope/tree/master/chapter09_natural-language-processing
On CUDA you'll see decent speedups as well.
This PR has to be carefully reviewed to make sure that the model before and after are doing the same thing (i've verified that the input / output shapes are all the same).