Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batched #15

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Batched #15

wants to merge 2 commits into from

Conversation

soumith
Copy link
Contributor

@soumith soumith commented Nov 6, 2017

This brings the rewritten, more efficient model from https://github.com/zackchase/mxnet-the-straight-dope/tree/master/chapter09_natural-language-processing

  • On 0.2 this is about 50% faster on CPU (OMP_NUM_THREADS=1)
  • On master this is about 3x faster on CPU (OMP_NUM_THREADS=1)
    On CUDA you'll see decent speedups as well.

This PR has to be carefully reviewed to make sure that the model before and after are doing the same thing (i've verified that the input / output shapes are all the same).

@dasguptar
Copy link
Owner

Hi @soumith 😮

Thanks for taking the time to send this PR. Unfortunately, after your previous PR, I kind of went back and looked at the model and realised I did many things in a dumb way.

Since this was the first model I had implemented in order to learn PyTorch, there was a bunch of unnecessary stuff, like F.torch.squeeze(tensor) instead of tensor.squeeze(). I went and tried to refactor the model myself yesterday, and optimised it a bit, reaching about ~2x speedup on CPU (5 minutes 30 seconds earlier to around 2 minutes 50 seconds now). I think some of the changes are similar to what you have done here, e.g. computing batched embeddings, combining linear layers, etc.

As of the latest commit, since the model file has changed, I cannot directly merge this PR, and I am quite inexperienced with rebasing and resolving conflicts. If and when you have the time, could you take a look at the current model.py, and decide whether to rebase the PR on current master, or if the current master is good enough?

# FC for i, f, u, o gates (N, 4*C), from input to hidden
i2h = F.linear(inputs, i2h_weight, i2h_bias)
i2h_slices = torch.split(i2h, i2h.size(1) // 4, dim=1) # (N, C)*4
i2h_iuo = torch.cat([i2h_slices[0], i2h_slices[2], i2h_slices[3]], dim=1) # (N, C*3)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the indices 0,2,3 and not 0,1,2? Why is i2h_f_slice = i2h_slices[1]?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why is it iuo and not iou? Is there some rationale behind this? I am using iou so wondering if I am making some mistake...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants