ggml-hexagon: fix swiglu failure at `test-backend-ops` #17344

chraac · 2025-11-18T03:24:17Z

Fixes some failure on the Hexagon swiglu/silu op and improves correctness of the GGML Hexagon backend.

Summary of changes

Add overflow-guarded HVX primitives (hvx_vec_exp_fp32_guard, hvx_vec_inverse_fp32_guard) and use them in the Hexagon backend.
Improve NaN/Inf handling in exp, inverse, and related helper func.
Fix mistakes in the Hexagon silu / swiglu implementations (including handling of src1 and swapped/split variants).

Before

[SILU] NMSE = 3.457965465 > 0.000000100   SILU(type=f32,ne_a=[128,2,2,2],v=0): �[1;31mFAIL�[0m
[SILU] NMSE = 0.496767445 > 0.000000100   SILU(type=f32,ne_a=[5,7,11,13],v=0): �[1;31mFAIL�[0m
[SWIGLU] NMSE = 3894.832699597 > 0.000000100   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=0): �[1;31mFAIL�[0m
[SWIGLU] NMSE = 2.032236263 > 0.000000100   SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=0): �[1;31mFAIL�[0m
[SWIGLU] NMSE = 531.844516626 > 0.000000100   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=1): �[1;31mFAIL�[0m
[SWIGLU] NMSE = 1.988691331 > 0.000000100   SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=1): �[1;31mFAIL�[0m
[SWIGLU] NMSE = 1040.190893229 > 0.000000100   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,split): �[1;31mFAIL�[0m
[SWIGLU] NMSE = 0.493175916 > 0.000000100   SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,split): �[1;31mFAIL�[0m

After

[SILU] NaN at index 231 (HTP0=-nan CPU=88.449669)   SILU(type=f32,ne_a=[128,2,2,2],v=0): �[1;31mFAIL�[0m
  SILU(type=f32,ne_a=[5,7,11,13],v=0): �[1;32mOK�[0m
[SWIGLU] NaN at index 122 (HTP0=nan CPU=-11446.431641)   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=0): �[1;31mFAIL�[0m
  SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=0): �[1;32mOK�[0m
[SWIGLU] NMSE = 3.835742624 > 0.000000100   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,swapped=1): �[1;31mFAIL�[0m
  SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,swapped=1): �[1;32mOK�[0m
[SWIGLU] NaN at index 216 (HTP0=nan CPU=-8444.154297)   SWIGLU(type=f32,ne_a=[128,2,2,2],v=0,split): �[1;31mFAIL�[0m
  SWIGLU(type=f32,ne_a=[5,7,11,13],v=0,split): �[1;32mOK�[0m

…_exp_f32

…d_inf for improved overflow handling

…y guard

wip

…nction

…orrectly

chraac · 2025-11-18T03:25:36Z

ggml/src/ggml-hexagon/htp/hvx-exp.c

+    static const float kMaxExp = 88.02f;  // log(INF)
+
+    const HVX_Vector     max_exp = hvx_vec_splat_fp32(kMaxExp);
+    const HVX_Vector     inf     = hvx_vec_splat_fp32(kInf);


thought we can move this init out of the for loop below.

chraac · 2025-11-18T03:27:11Z

ggml/src/ggml-hexagon/htp/hvx-utils.h

+    uint32_t   w[VLEN_FP32];
+    __fp16     fp16[VLEN_FP16];
+    float      fp32[VLEN_FP32];
+} __attribute__((aligned(VLEN), packed)) HVX_VectorAlias;


its safe to use the gcc ext since in htp we're using clang.

chraac · 2025-11-18T03:28:56Z

ggml/src/ggml-hexagon/htp/act-ops.c

    const float   limit   = ((const float *) (op_params))[3];

-    const int nc = (src1_valid) ? ne0 : ne0 / 2;
+    const int nc = (src1_valid) ? ne00 : ne00 / 2;


looks we should use src0->ne[0] here instead of dst->ne[0]

chraac · 2025-11-18T03:30:33Z

ggml/src/ggml-hexagon/htp/hvx-utils.h

    // neg by setting the fp32 sign bit
    HVX_Vector mask = Q6_V_vsplat_R(0x80000000);
-    return Q6_V_vor_VV(v, mask);
+    return Q6_V_vxor_VV(v, mask);


using xor here to fix when x < 0

max-krasnyansky

Looks good!

Thanks for the updates.
This PR also fixes #16854

chraac · 2025-11-21T05:25:05Z

This PR also fixes #16854

Just a heads-up: I've only fixed the opt_path == false path here. hvx_fast_sigmoid_f32 is still tricky because of the manual float exponent math causing NaNs, so I'll tackle that in a separate PR.

max-krasnyansky · 2025-11-21T22:37:49Z

This PR also fixes #16854

Just a heads-up: I've only fixed the opt_path == false path here. hvx_fast_sigmoid_f32 is still tricky because of the manual float exponent math causing NaNs, so I'll tackle that in a separate PR.

Yep. I saw that other tests are still failing.
But the Qwen3-0.6B output is definitely "fixed", as in it's actually sensible now :)

chraac added 23 commits November 13, 2025 20:25

refactor: use hvx_vec_exp_fp32_guard_inf for overflow handling in hvx…

ab75281

…_exp_f32

feat: add fast sigmoid function with overflow guard for fp32

5aa4a83

refactor: replace hvx_vec_inverse_fp32 with hvx_vec_inverse_fp32_guar…

a64154c

…d_inf for improved overflow handling

feat: enhance hvx_add_scalar_f32 with overflow handling using infinit…

a8cdbcf

…y guard

wip

ae42fb6

add HVX_Vector_Alias

39445ab

wip

wip

a589b61

fix: improve handling of src1 tensor in glu_swiglu_fp32_per_thread fu…

6f57b9e

…nction

fix nc

db9e930

wip

ce48af5

wip

fc5f31f

handle nan at inverse

5707384

wip

54235e3

fix neg

3859465

wip

014ad77

rename

8c37457

fix hvx_vec_inverse_fp32_guard_inf to handle infinity and NaN cases c…

33a050e

…orrectly

wip

83884a5

fix hvx_vec_inverse_fp32_guard_inf to handle NaN cases correctly

f7662f3

Merge branch 'master' into dev-fix-swiglu

5f553f0

wip

f6d7f3c

wip

37e9a1d

wip

185dc20

chraac requested review from lhez and max-krasnyansky as code owners November 18, 2025 03:24

chraac marked this pull request as draft November 18, 2025 03:24

chraac commented Nov 18, 2025

View reviewed changes

fix output sign

6d88789

DajanaV mentioned this pull request Nov 18, 2025

UPSTREAM PR #17344: ggml-hexagon: fix swiglu failure at test-backend-ops auroralabs-loci/llama.cpp#247

Open

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 18, 2025

chraac marked this pull request as ready for review November 19, 2025 01:55

chraac added 2 commits November 19, 2025 19:49

Merge branch 'master' into dev-fix-swiglu

e07cbd6

Merge branch 'master' into dev-fix-swiglu

55cea09

max-krasnyansky approved these changes Nov 20, 2025

View reviewed changes

max-krasnyansky merged commit 21d31e0 into ggml-org:master Nov 20, 2025
72 of 74 checks passed

max-krasnyansky mentioned this pull request Nov 20, 2025

Eval bug:Garbled text appears when running the Qwen3-0.6B model on a mobile phone using the Hexagon backend #16854

Open

chraac deleted the dev-fix-swiglu branch November 21, 2025 06:02

chraac mentioned this pull request Nov 23, 2025

ggml-hexagon: fix swiglu failure at test-backend-ops part2 #17449

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-hexagon: fix swiglu failure at `test-backend-ops` #17344

ggml-hexagon: fix swiglu failure at `test-backend-ops` #17344

Uh oh!

chraac commented Nov 18, 2025

Uh oh!

chraac Nov 18, 2025 •

edited

Loading

Uh oh!

chraac Nov 18, 2025 •

edited

Loading

Uh oh!

chraac Nov 18, 2025

Uh oh!

chraac Nov 18, 2025 •

edited

Loading

Uh oh!

max-krasnyansky left a comment

Uh oh!

Uh oh!

chraac commented Nov 21, 2025

Uh oh!

max-krasnyansky commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggml-hexagon: fix swiglu failure at test-backend-ops #17344

ggml-hexagon: fix swiglu failure at test-backend-ops #17344

Uh oh!

Conversation

chraac commented Nov 18, 2025

Uh oh!

chraac Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chraac Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chraac Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

chraac Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

max-krasnyansky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chraac commented Nov 21, 2025

Uh oh!

max-krasnyansky commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggml-hexagon: fix swiglu failure at `test-backend-ops` #17344

ggml-hexagon: fix swiglu failure at `test-backend-ops` #17344

chraac Nov 18, 2025 •

edited

Loading

chraac Nov 18, 2025 •

edited

Loading

chraac Nov 18, 2025 •

edited

Loading