-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update cached models and benchmarks #705
Conversation
A minimum version is required by toch_neuronx (but not enforced).
37e7222
to
7a8e99c
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the updates!
@@ -27,7 +25,7 @@ def main(): | |||
export=True, | |||
batch_size=batch_size, | |||
sequence_length=seq_length, | |||
auto_cast_type="fp16", | |||
auto_cast_type="bf16", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's now proven that bf16 is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually does not make any difference as far as I can tell, and this seems to allow to load weights faster (because they are not converted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I have made a small modification in the export code to always set |
d514284
to
337d882
Compare
What does this PR do?
This modifies the cache population workflow to use the files corresponding to new models.
This also updates the benchmark in the documentation.