Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I've added bigbird's attention to my model, but not seeing a decrease in memory #33

Open
Currie32 opened this issue May 9, 2022 · 5 comments

Comments

@Currie32
Copy link

Currie32 commented May 9, 2022

I've replaced the attention layers in Enformer with those in bigbird, but the memory usage calculated by tf.get_memory_info shows the usage is still basically the same (within 1%). I'm wondering if I need to include code from the encoder or decoder to see a decrease in memory usage?

Thanks!

@ppham27
Copy link

ppham27 commented May 10, 2022

To clarify, you are using

class MultiHeadedAttentionLayer(tf.keras.layers.Layer):
with attention_type = 'block_sparse' ?

What's your sequence length ?

@Currie32
Copy link
Author

Correct, I'm using that class with block_sparse attention.
When the sequence enters the attention layer, its length is 1536.

@ppham27
Copy link

ppham27 commented May 10, 2022

I see. Does the memory used change with sequence length?

I don't suppose your are using XLA? BigBird can be as much as 30% faster with tf.function(jit_compile=True). It also produces better memory profiles that make it easier to debug.

@Currie32
Copy link
Author

Yes, the memory used increases with sequence length.

I'm not using XLA, and thanks for the tip!

@ppham27
Copy link

ppham27 commented May 11, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants