[Usage]: KVarN KV cache dtype kvarn_k4v2_g128 only works with head_dim=128?

### Your current environment

**Description:** I'm trying to serve GLM-4.7-Flash with KVarN's `kvarn_k4v2_g128` KV cache dtype, but it fails with a head_dim validation error. The model has `head_dim=576`, which is incompatible with the KVarN dtype requirement.

**Question:**  Is `kvarn_k4v2_g128` designed to work only with models having `head_dim=128`? Or am I missing a configuration step?

**Command**
```bash
python -m vllm.entrypoints.openai.api_server \
  --model /home/mani/workspace/hfModels/GLM-4.7-Flash \
  --host 0.0.0.0 \
  --port 8001 \
  --tensor-parallel-size 2 \
  --block-size 128 \
  --kv-cache-dtype kvarn_k4v2_g128
```

**Error**
```text
(APIServer pid=406685) pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
(APIServer pid=406685)   Value error, kvarn_k4v2_g128 requires head_dim=128, but this model has head_dim=576. KVarN currently supports head_dim=128 only; use a different --kv-cache-dtype for this model. [type=value_error, input_value=ArgsKwargs((), {'model_co... 'shutdown_timeout': 0}), input_type=ArgsKwargs]
(APIServer pid=406685)     For further information visit https://errors.pydantic.dev/2.12/v/value_error
```

**Model Info**
- **Model:** GLM-4.7-Flash
- **Architecture:** `Glm4MoeLiteForCausalLM`
- **head_dim:** `576`

**Additional Question:** Is there a workaround, or should KVarN dtypes be documented as head_dim-specific?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Usage]: KVarN KV cache dtype kvarn_k4v2_g128 only works with head_dim=128? #10

Your current environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: KVarN KV cache dtype kvarn_k4v2_g128 only works with head_dim=128? #10

Description

Your current environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions