SqueezeAILab / KVQuant Public

Notifications You must be signed in to change notification settings
Fork 27
Star 315

Code
Issues 13
Pull requests 1
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: SqueezeAILab/KVQuant

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

13 Open 4 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

could it support LLAMA-3.1-8B-INSTRUCT?

#18 opened Dec 19, 2024 by LiMa-cas

Evaluating KVQuant for 128k sequence length

#17 opened Nov 24, 2024 by md-hassan

why are the fp16 baseline perplexity values little different?

#16 opened Oct 2, 2024 by shahaamirbader

Problems when reproducing the method on Qwen2-7b-instruct

#15 opened Sep 14, 2024 by jiangshimiao

How to reproduce the table 19 (kvquant vs kivi)

#14 opened Aug 21, 2024 by condy0919

Coupled Channel-wise Quantization

#12 opened Jun 30, 2024 by naston

Would the current implementation of Fisher Information work out of the box with Multi-head Latent Attention

#11 opened Jun 30, 2024 by naston

[Question] How to run HF model with 1m-length tokens in your exp?

#10 opened May 17, 2024 by 1649759610

CUDA error: an illegal memory access was encountered

#9 opened May 5, 2024 by CUHKSZzxy

Question about storage

#8 opened May 5, 2024 by mlxht990720

Where is the code of "ATOM-4bit"in the KVQuant codebase?

#7 opened Apr 17, 2024 by leoliu1979

The value of self.include_sparse being 0 causes the assert (False) error

#6 opened Apr 16, 2024 by ascendpoet

AttributeError: 'LlamaModel' object has no attribute 'split_gpus'

#4 opened Mar 5, 2024 by seeyourcell

ProTip! What’s not been updated in a month: updated:<2024-11-23.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly