Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The effect of torch2.0 on results #79

Open
2021270902001sc opened this issue Oct 28, 2023 · 1 comment
Open

The effect of torch2.0 on results #79

2021270902001sc opened this issue Oct 28, 2023 · 1 comment

Comments

@2021270902001sc
Copy link

Does your result include an ablation of the operator properties of torch? I have noticed that the model operation speed of transformer on torch2.0 is significantly faster than that of torch1.x

@Phil26AT
Copy link
Collaborator

The results in the paper are with torch 1.x and no FlashAttention (unless explicitly mentioned). The released code has plenty of performance optimizations and is significantly faster than what is reported in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants