Replies: 1 comment
-
Same issue |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
(on tpu v3)
any suggestions for figuring out memory use for errors like:
usually we get the more helpful (if still somewhat cryptic) errors showing allocated buffers sorted by max size, but some times we just get this trace with nothing really to help guide us.
What are the best strategies for understanding the memory use here? The model is only 125M params (and we're using fsdp), so memory shouldn't be an issue, but it is for some reason
Beta Was this translation helpful? Give feedback.
All reactions