vk::DeviceLostError in llama_decode on Mali-G720 MC10 with multimodal model (GLM-OCR)

llama-server crashes with vk::DeviceLostError during the first llama_decode call after multimodal image processing (mmproj) on Mali-G720 MC10 using the panvk Vulkan driver. The vision encoder (mmproj) completes successfully on CPU (--no-mmproj-offload), but the subsequent LLM decode step on GPU triggers the crash. Pure text models (Qwen3-4B Q4_K_M) load and run on GPU without crashing, confirming the issue is specific to the multimodal decode path.

Hardware &amp; Software

Component | Details
-- | --
Device | Radxa Orion O6n
SoC | CIX P1
GPU | Mali-G720 MC10
RAM | 48 GB (UMA – shared with GPU)
OS | Debian 12 (aarch64)
Mesa | 26.0.0-1sky1.2 (Sky1-Linux apt repo)
Vulkan driver | panvk (DRIVER_ID_MESA_PANVK)
Vulkan API | 1.4.335
llama.cpp | build 8208 (b5ed0e058), GNU 15.2.0, aarch64
llama.cpp build | -DGGML_VULKAN=ON -DGGML_NATIVE=ON


The pure text model (Qwen3-4B) running on GPU confirms that panvk and llama.cpp are correctly built and that basic GPU inference works. The crash is specific to the multimodal decode path – specifically the first llama_decode call after image token injection.
 
Hypothesis
The image tokens (456 tokens from the vision encoder) are injected into the KV cache and the first decode batch contains 3 tokens (batch.n_tokens = 3 in logs). This mixed prompt — partly processed on CPU (vision), partly on GPU (LLM) — may trigger a memory barrier or synchronization issue in panvk when the GPU begins processing the combined context. This is potentially related to the WLS race condition fixed in:
panvk/csf: serialize WLS dispatches to prevent pipelining race 
<a href="https://github.com/Sky1-Linux/mesa/commit/31525eea1fb7afe156ee9e0492148e25fa6c71d0">https://github.com/Sky1-Linux/mesa/commit/31525eea1fb7afe156ee9e0492148e25fa6c71d0</a>
However, the crash here is DeviceLostError / waitForFences, not TRANSLATION_FAULT, suggesting a different (possibly related) synchronization failure in the CSF command stream when handling large batches after cross-backend (CPU→GPU) token handoff.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vk::DeviceLostError in llama_decode on Mali-G720 MC10 with multimodal model (GLM-OCR) #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Details
Device	Radxa Orion O6n
SoC	CIX P1
GPU	Mali-G720 MC10
RAM	48 GB (UMA – shared with GPU)
OS	Debian 12 (aarch64)
Mesa	26.0.0-1sky1.2 (Sky1-Linux apt repo)
Vulkan driver	panvk (DRIVER_ID_MESA_PANVK)
Vulkan API	1.4.335
llama.cpp	build 8208 (b5ed0e058), GNU 15.2.0, aarch64
llama.cpp build	-DGGML_VULKAN=ON -DGGML_NATIVE=ON

vk::DeviceLostError in llama_decode on Mali-G720 MC10 with multimodal model (GLM-OCR) #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions