Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluating KVQuant for 128k sequence length #17

Open
md-hassan opened this issue Nov 24, 2024 · 0 comments
Open

Evaluating KVQuant for 128k sequence length #17

md-hassan opened this issue Nov 24, 2024 · 0 comments

Comments

@md-hassan
Copy link

md-hassan commented Nov 24, 2024

Hi,

Thanks for the great work. I am trying to evaluate KVQuant on longer context lengths (128k) for Llama 3.1. However, I am going out of memory while using seqlen=131072 in run-fisher.py on multiple A100s (goes OOM even at seqlen of 32k).

I do notice that you have used seqlen=2048 in the pre-processing steps but evaluate on longer context lengths upto 32k in eval_passkey_simquant .py. In that case, I wanted to know, would it be right to use the same pre-processing on 128k context length? If not, could you please help me with the OOM error stated above?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant