-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I've added bigbird's attention to my model, but not seeing a decrease in memory #33
Comments
To clarify, you are using bigbird/bigbird/core/attention.py Line 637 in 5f2a5aa
attention_type = 'block_sparse' ?
What's your sequence length ? |
Correct, I'm using that class with |
I see. Does the memory used change with sequence length? I don't suppose your are using XLA? BigBird can be as much as 30% faster with |
Yes, the memory used increases with sequence length. I'm not using XLA, and thanks for the tip! |
https://www.tensorflow.org/guide/profiler#memory_profile_tool may also be useful. The XLA memory viewer (https://cloud.google.com/tpu/docs/pytorch-xla-performance-profiling-tpu-vm#memory_viewer) is better but both are useful. |
I've replaced the attention layers in Enformer with those in bigbird, but the memory usage calculated by tf.get_memory_info shows the usage is still basically the same (within 1%). I'm wondering if I need to include code from the encoder or decoder to see a decrease in memory usage?
Thanks!
The text was updated successfully, but these errors were encountered: