Evaluating KVQuant for 128k sequence length #17

md-hassan · 2024-11-24T15:19:23Z

Hi,

Thanks for the great work. I am trying to evaluate KVQuant on longer context lengths (128k) for Llama 3.1. However, I am going out of memory while using seqlen=131072 in run-fisher.py on multiple A100s (goes OOM even at seqlen of 32k).

I do notice that you have used seqlen=2048 in the pre-processing steps but evaluate on longer context lengths upto 32k in eval_passkey_simquant .py. In that case, I wanted to know, would it be right to use the same pre-processing on 128k context length? If not, could you please help me with the OOM error stated above?

Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluating KVQuant for 128k sequence length #17

Evaluating KVQuant for 128k sequence length #17

md-hassan commented Nov 24, 2024 •

edited

Loading

Evaluating KVQuant for 128k sequence length #17

Evaluating KVQuant for 128k sequence length #17

Comments

md-hassan commented Nov 24, 2024 • edited Loading

md-hassan commented Nov 24, 2024 •

edited

Loading