Skip to content

[Usage]: KVarN KV cache dtype kvarn_k4v2_g128 only works with head_dim=128? #10

Description

Your current environment

Description: I'm trying to serve GLM-4.7-Flash with KVarN's kvarn_k4v2_g128 KV cache dtype, but it fails with a head_dim validation error. The model has head_dim=576, which is incompatible with the KVarN dtype requirement.

Question: Is kvarn_k4v2_g128 designed to work only with models having head_dim=128? Or am I missing a configuration step?

Command

python -m vllm.entrypoints.openai.api_server \
  --model /home/mani/workspace/hfModels/GLM-4.7-Flash \
  --host 0.0.0.0 \
  --port 8001 \
  --tensor-parallel-size 2 \
  --block-size 128 \
  --kv-cache-dtype kvarn_k4v2_g128

Error

(APIServer pid=406685) pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
(APIServer pid=406685)   Value error, kvarn_k4v2_g128 requires head_dim=128, but this model has head_dim=576. KVarN currently supports head_dim=128 only; use a different --kv-cache-dtype for this model. [type=value_error, input_value=ArgsKwargs((), {'model_co... 'shutdown_timeout': 0}), input_type=ArgsKwargs]
(APIServer pid=406685)     For further information visit https://errors.pydantic.dev/2.12/v/value_error

Model Info

  • Model: GLM-4.7-Flash
  • Architecture: Glm4MoeLiteForCausalLM
  • head_dim: 576

Additional Question: Is there a workaround, or should KVarN dtypes be documented as head_dim-specific?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions