Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarifying last step of the 'transform_roc' function #26

Open
sharpsy opened this issue Jul 18, 2018 · 7 comments
Open

Clarifying last step of the 'transform_roc' function #26

sharpsy opened this issue Jul 18, 2018 · 7 comments

Comments

@sharpsy
Copy link

sharpsy commented Jul 18, 2018

Thanks for the implementation, it is a big help!
I'm trying to wrap my head around the code and I had some questions here and there. One of them was the last step in the transform_roc function. It was not clear to me why would they augment the input sequence with another dimension containing increasing embedding ids, ids that are greater than tokens from source data vocabulary. Namely:
xmb[:, :, :, 1] = np.arange(n_vocab + n_special, n_vocab + n_special + n_ctx)

The explanation is in the paper itself: We used learned position embeddings instead of the sinusoidal version proposed in the original work. Those are ids of position embeddings and are added to the input data in the forward() method of the TransformerModel . It took me some time to find this and we can help others that struggle with the same line.

I would suggest adding comments to the code that will help with understanding.

In the transform_roc function, before the last step:
# Position information that is added to input embeddings in the TransformerModel
And then in the forward() method of the TransformerModel before the h = e.sum(dim=2) line:
# Add position embeddings to the input embeddings

@sharpsy sharpsy changed the title Clarifying last step of the transform_roc function Clarifying last step of the 'transform_roc' function Jul 18, 2018
@sharpsy
Copy link
Author

sharpsy commented Jul 18, 2018

I just found #12 and openai/finetune-transformer-lm#9

Not sure how I missed the first issue, I have looked at the open issues - but it shows that comments in this code will help with understanding.

@rodgzilla
Copy link
Contributor

Hi @sharpsy,

I also think that this comment would help people understand this code. Do you want to add it yourself or would you like me to create the pull request?

@sharpsy
Copy link
Author

sharpsy commented Jul 18, 2018

Ok, I'll create the pull request.

@teucer
Copy link

teucer commented Jul 18, 2018

What I don't understand is why we need to start from n_vocab + n_special not e.g. from 0. Any explanations?

@sharpsy
Copy link
Author

sharpsy commented Jul 18, 2018

Tokens from 0 to n_vocab are tokens from the vocabulary (data), from n_vocab to n_vocab+n_special are special tokens: _start_, _delimiter_ and _classify_. All above, up to n_vocab + n_special + n_ctx are for encoding the position.

@rodgzilla
Copy link
Contributor

The indices from 0 to n_vocab + n_special are already taken in the embedding matrix by the vocabulary words (n_vocab) and the special tokens such as _start_, _delimiter_ and _classify_.

@teucer
Copy link

teucer commented Jul 18, 2018

thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants