Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Transformer model #601

Open
hrzn opened this issue Dec 1, 2021 · 3 comments · May be fixed by #1915
Open

Refactor Transformer model #601

hrzn opened this issue Dec 1, 2021 · 3 comments · May be fixed by #1915
Labels
improvement New feature or improvement

Comments

@hrzn
Copy link
Contributor

hrzn commented Dec 1, 2021

Currently the Transformer is not really implemented as it should. We should revisit to implement it like the in original Transformer paper; including always training for predicting next sample (like language models), and calling the encoder+decoder in auto-regressive ways when producing forecasts. See: Attention Is All You Need

Note from @pennfranc :
This current implementation is fully functional and can already produce some good predictions. However, it is still limited in how it uses the Transformer architecture because the tgt input of torch.nn.Transformer is not utlized to its full extent. Currently, we simply pass the last value of the src input to tgt. To get closer to the way the Transformer is usually used in language models, we should allow the model to consume its own output as part of the tgt argument, such that when predicting sequences of values, the input to the tgt argument would grow as outputs of the transformer model would be added to it. Of course, the training of the model would have to be adapted accordingly.

@hrzn hrzn added the improvement New feature or improvement label Dec 1, 2021
@hrzn
Copy link
Contributor Author

hrzn commented Aug 18, 2022

See also: #672

@JanFidor
Copy link
Contributor

JanFidor commented Jul 20, 2023

Hi @dennisbader @madtoinou , while working on the RWKV PR I realized, that I'm not using teacher forcing during training which would hinder the training quite a bit. It's a big part of this issue so I wanted to ask if I could pick it up, so that I had a point reference of how its final implementation should look like if I get it merged (+ the issue looks really cool ;) )

@dennisbader
Copy link
Collaborator

Hi @JanFidor, of course :) We would be happy about your contribution to improve the Transformer model 🚀

Let us know if you need any assistance

@JanFidor JanFidor linked a pull request Jul 23, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement New feature or improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants