You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the great work. I am trying to evaluate KVQuant on longer context lengths (128k) for Llama 3.1. However, I am going out of memory while using seqlen=131072 in run-fisher.py on multiple A100s (goes OOM even at seqlen of 32k).
I do notice that you have used seqlen=2048 in the pre-processing steps but evaluate on longer context lengths upto 32k in eval_passkey_simquant .py. In that case, I wanted to know, would it be right to use the same pre-processing on 128k context length? If not, could you please help me with the OOM error stated above?
Thanks!
The text was updated successfully, but these errors were encountered:
Hi,
Thanks for the great work. I am trying to evaluate KVQuant on longer context lengths (128k) for Llama 3.1. However, I am going out of memory while using
seqlen=131072
inrun-fisher.py
on multiple A100s (goes OOM even at seqlen of 32k).I do notice that you have used
seqlen=2048
in the pre-processing steps but evaluate on longer context lengths upto 32k ineval_passkey_simquant .py
. In that case, I wanted to know, would it be right to use the same pre-processing on 128k context length? If not, could you please help me with the OOM error stated above?Thanks!
The text was updated successfully, but these errors were encountered: