-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use AWS Neuron sdk 2.21 #754
Conversation
1d60cef
to
e3e9c80
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
54d54a0
to
fef3fa8
Compare
00c9655
to
fa0d0b9
Compare
@dacorvo For failing vision model tests, it seems that batch_size != 1 will all fail (in our tests, we applied batch_size = 2). I will open a ticket in neuron SDK repo, I could set batch size to 1 to make the CIs green, but I doubt if we should bump to that version... |
083a98f
to
466d6d9
Compare
This PR is stale because it has been open 15 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
ad35ea2
to
98c718f
Compare
Note that we don't use the latest TnX version, since it has a pinned transformers version that is lower than the one we require.
Note that we must now use the multi-framework base AMI, as there is no specific pytorch AMI.
1f74d38
to
c042f27
Compare
After version 0.29 some error codes have changed.
c042f27
to
2286a92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What does this PR do?
This bumps the AWS Neuron SDK version to use AWS Neuron SDK 2.21, which is the first SDK compatible with trn2 instances.
The underlying
pytorch
version is now 2.5.1, which implies significant changes in the XLA stack.This leads to compilation errors in:
[TEN404] (_divide.1146) Internal tensorizer error: BirCodeGenLoop:Too many strides! {{{{0,+,1}[4],+,0}[2],+,4}[16],+,0}[2]