Skip to content

vk::DeviceLostError in llama_decode on Mali-G720 MC10 with multimodal model (GLM-OCR) #1

@mattip123

Description

@mattip123

llama-server crashes with vk::DeviceLostError during the first llama_decode call after multimodal image processing (mmproj) on Mali-G720 MC10 using the panvk Vulkan driver. The vision encoder (mmproj) completes successfully on CPU (--no-mmproj-offload), but the subsequent LLM decode step on GPU triggers the crash. Pure text models (Qwen3-4B Q4_K_M) load and run on GPU without crashing, confirming the issue is specific to the multimodal decode path.

Hardware & Software

Component Details
Device Radxa Orion O6n
SoC CIX P1
GPU Mali-G720 MC10
RAM 48 GB (UMA – shared with GPU)
OS Debian 12 (aarch64)
Mesa 26.0.0-1sky1.2 (Sky1-Linux apt repo)
Vulkan driver panvk (DRIVER_ID_MESA_PANVK)
Vulkan API 1.4.335
llama.cpp build 8208 (b5ed0e058), GNU 15.2.0, aarch64
llama.cpp build -DGGML_VULKAN=ON -DGGML_NATIVE=ON

The pure text model (Qwen3-4B) running on GPU confirms that panvk and llama.cpp are correctly built and that basic GPU inference works. The crash is specific to the multimodal decode path – specifically the first llama_decode call after image token injection.


Hypothesis

The image tokens (456 tokens from the vision encoder) are injected into the KV cache and the first decode batch contains 3 tokens (batch.n_tokens = 3 in logs). This mixed prompt — partly processed on CPU (vision), partly on GPU (LLM) — may trigger a memory barrier or synchronization issue in panvk when the GPU begins processing the combined context. This is potentially related to the WLS race condition fixed in:

panvk/csf: serialize WLS dispatches to prevent pipelining race
31525ee

However, the crash here is DeviceLostError / waitForFences, not TRANSLATION_FAULT, suggesting a different (possibly related) synchronization failure in the CSF command stream when handling large batches after cross-backend (CPU→GPU) token handoff.


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions