Skip to content

Conversation

@DajanaV
Copy link
Collaborator

@DajanaV DajanaV commented Nov 18, 2025

Mirrored from ggml-org/llama.cpp#17344

Fixes some failure on the Hexagon swiglu/silu op and improves correctness of the GGML Hexagon backend.

Summary of changes

  • Add overflow-guarded HVX primitives (hvx_vec_exp_fp32_guard, hvx_vec_inverse_fp32_guard) and use them in the Hexagon backend.
  • Improve NaN/Inf handling in exp, inverse, and related helper func.
  • Fix mistakes in the Hexagon silu / swiglu implementations (including handling of src1 and swapped/split variants).

Before

[SILU] NMSE = 3.457965465 > 0.000000100   SILU(type=f32,ne_a=[128,2,2,2],v=0): ^[[1;31mFAIL^[[0m
[SILU] NMSE = 0.496767445 > 0.000000100   SILU(type=f32,ne_a=[5,7,11,13],v=0): ^[[1;31mFAIL^[[0m
[SWIGLU] NMSE = 3894.832699597 > 0.000000100   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=0): ^[[1;31mFAIL^[[0m
[SWIGLU] NMSE = 2.032236263 > 0.000000100   SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=0): ^[[1;31mFAIL^[[0m
[SWIGLU] NMSE = 531.844516626 > 0.000000100   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=1): ^[[1;31mFAIL^[[0m
[SWIGLU] NMSE = 1.988691331 > 0.000000100   SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=1): ^[[1;31mFAIL^[[0m
[SWIGLU] NMSE = 1040.190893229 > 0.000000100   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,split): ^[[1;31mFAIL^[[0m
[SWIGLU] NMSE = 0.493175916 > 0.000000100   SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,split): ^[[1;31mFAIL^[[0m

After

[SILU] NaN at index 231 (HTP0=-nan CPU=88.449669)   SILU(type=f32,ne_a=[128,2,2,2],v=0): ^[[1;31mFAIL^[[0m
  SILU(type=f32,ne_a=[5,7,11,13],v=0): ^[[1;32mOK^[[0m
[SWIGLU] NaN at index 122 (HTP0=nan CPU=-11446.431641)   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=0): ^[[1;31mFAIL^[[0m
  SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=0): ^[[1;32mOK^[[0m
[SWIGLU] NMSE = 3.835742624 > 0.000000100   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=1): ^[[1;31mFAIL^[[0m
  SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=1): ^[[1;32mOK^[[0m
[SWIGLU] NaN at index 216 (HTP0=nan CPU=-8444.154297)   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,split): ^[[1;31mFAIL^[[0m
  SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,split): ^[[1;32mOK^[[0m

@DajanaV DajanaV force-pushed the main branch 3 times, most recently from f333350 to 9c4623f Compare November 18, 2025 09:10
@loci-dev loci-dev force-pushed the main branch 2 times, most recently from 64f477c to 7c4fc52 Compare November 20, 2025 11:08
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from ad5ad9a to aaa8a85 Compare November 27, 2025 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants