-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to reduce memory when decoding on CPU? #1672
Comments
Could you tell us which script you are using? Have you changed any code or just used the original code from us? Also, please tell us whether you are using a streaming or a non-streaming model and what is the typical wave duration of your test file. It would be great if you can post the complete decoding command. |
Hi @csukuangfj, I have export the model using torch.jit.export and write my one decode code, which is used in offline scenario. However, the core decoding code is from icefall. Actually my wave is usually long than 1 minite, but the max duration of wave file for asr is 20s (which is force cut by energy vad). I have attached code below.
|
I see. Please use @torch.no_grad() as what we are doing in decoding. |
Hi @csukuangfj, Yeah, add @torch.no_grad() seems work, the memory decrease from 2.5G to 2.1G. I'm not sure is it normal to use ~2G memory, or any other idea to decrease it? |
Could you post your updated code? |
|
Does the memory grow linearly from 0 to 2.1GB and then keep at 2.1 GB?
Could you give the output of the above log? What is the max value of duration? |
The max duration is 20s in my wav files. The memory first grow to around 1.5G for the first few wavs, then grow slowly to 2.1G and keep at 2.1G. The first few wavs (around 5 wavs) is extremely slow, may cost 1~2 minute to finish decode. I'm not sure if it's normal that the warm up for this asr model need this time. |
With zipformer I can get good performance.
Currently, when I decode on CPU one by one (not using batch), the memory cost will go to 2.5G. The token size is 5000 and using greedy_search to decode. I try to reduce it to 4000, but the memory cost seem not decrease much.
Any idea to reduce it without obvious performance degrade?
Some model configurations below:
num-encoder-layers=2,2,2,3,2,2
downsampling-factor=1,2,4,8,4,2
feedforward-dim=256,384,512,768,512,384
num-heads=4,4,4,4,4,4
encoder-dim=192,256,256,384,256,256
query-head-dim=24
value-head-dim=8
pos-head-dim=4
pos-dim=24
encoder-unmasked-dim=192,192,256,256,256,192
cnn-module-kernel=31,31,15,15,15,31
decoder-dim=256
joiner-dim=256
The text was updated successfully, but these errors were encountered: