I've added bigbird's attention to my model, but not seeing a decrease in memory #33

Currie32 · 2022-05-09T13:52:16Z

I've replaced the attention layers in Enformer with those in bigbird, but the memory usage calculated by tf.get_memory_info shows the usage is still basically the same (within 1%). I'm wondering if I need to include code from the encoder or decoder to see a decrease in memory usage?

Thanks!

ppham27 · 2022-05-10T14:35:50Z

To clarify, you are using

bigbird/bigbird/core/attention.py

Line 637 in 5f2a5aa

class MultiHeadedAttentionLayer(tf.keras.layers.Layer):

with attention_type = 'block_sparse' ?

What's your sequence length ?

Currie32 · 2022-05-10T15:11:48Z

Correct, I'm using that class with block_sparse attention.
When the sequence enters the attention layer, its length is 1536.

ppham27 · 2022-05-10T15:24:13Z

I see. Does the memory used change with sequence length?

I don't suppose your are using XLA? BigBird can be as much as 30% faster with tf.function(jit_compile=True). It also produces better memory profiles that make it easier to debug.

Currie32 · 2022-05-11T12:29:31Z

Yes, the memory used increases with sequence length.

I'm not using XLA, and thanks for the tip!

ppham27 · 2022-05-11T14:24:44Z

https://www.tensorflow.org/guide/profiler#memory_profile_tool may also be useful. The XLA memory viewer (https://cloud.google.com/tpu/docs/pytorch-xla-performance-profiling-tpu-vm#memory_viewer) is better but both are useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I've added bigbird's attention to my model, but not seeing a decrease in memory #33

I've added bigbird's attention to my model, but not seeing a decrease in memory #33

Currie32 commented May 9, 2022

ppham27 commented May 10, 2022

Currie32 commented May 10, 2022

ppham27 commented May 10, 2022

Currie32 commented May 11, 2022

ppham27 commented May 11, 2022

I've added bigbird's attention to my model, but not seeing a decrease in memory #33

I've added bigbird's attention to my model, but not seeing a decrease in memory #33

Comments

Currie32 commented May 9, 2022

ppham27 commented May 10, 2022

Currie32 commented May 10, 2022

ppham27 commented May 10, 2022

Currie32 commented May 11, 2022

ppham27 commented May 11, 2022