Skip to content

Conversation

@mediouni-m
Copy link

@mediouni-m mediouni-m commented Nov 20, 2025

Runs models, but really slow

Performance numbers on Llama-3.2-1B-Instruct-q4_0.gguf with ./llama-cli -ngl 999 on Makena:

common_perf_print:    sampling time =      27.44 ms
common_perf_print:    samplers time =      12.94 ms /   137 tokens
common_perf_print:        load time =     625.88 ms
common_perf_print: prompt eval time =    1561.50 ms /    28 tokens (   55.77 ms per token,    17.93 tokens per second)
common_perf_print:        eval time =   14988.70 ms /   108 runs   (  138.78 ms per token,     7.21 tokens per second)
common_perf_print:       total time =   25197.05 ms /   136 tokens
common_perf_print: unaccounted time =    8619.42 ms /  34.2 %      (total - sampling - prompt eval - eval) / (total)
common_perf_print:    graphs reused =        108
llama_memory_breakdown_print: | memory breakdown [MiB] | total   free    self   model   context   compute       unaccounted |
llama_memory_breakdown_print: |   - HTP0 (Hexagon)     |  2048 = 2048 + ( 522 =   522 +       0 +       0) + 17592186043894 |
llama_memory_breakdown_print: |   - Host               |                 1186 =   266 +     128 +     792                   |

CPU for comparison:

common_perf_print:    sampling time =      51.48 ms
common_perf_print:    samplers time =      24.49 ms /   306 tokens
common_perf_print:        load time =     468.88 ms
common_perf_print: prompt eval time =     183.36 ms /    14 tokens (   13.10 ms per token,    76.35 tokens per second)
common_perf_print:        eval time =   10807.84 ms /   291 runs   (   37.14 ms per token,    26.92 tokens per second)
common_perf_print:       total time =   15182.34 ms /   305 tokens
common_perf_print: unaccounted time =    4139.66 ms /  27.3 %      (total - sampling - prompt eval - eval) / (total)
common_perf_print:    graphs reused =        289
llama_memory_breakdown_print: | memory breakdown [MiB] | total   free    self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - HTP0 (Hexagon)     |  2048 = 2048 + (   0 =     0 +       0 +       0) +           0 |
llama_memory_breakdown_print: |   - Host               |                 1165 =   779 +     128 +     258                |
llama_memory_breakdown_print: |   - CPU_REPACK         |                  522 =   522 +       0 +       0                |

And with FA on NPU

common_perf_print: prompt eval time =     478.30 ms /    16 tokens (   29.89 ms per token,    33.45 tokens per second)
common_perf_print:        eval time =   18251.43 ms /   162 runs   (  112.66 ms per token,     8.88 tokens per second)

And on CPU

common_perf_print: prompt eval time =     197.30 ms /    16 tokens (   12.33 ms per token,    81.10 tokens per second)
common_perf_print:        eval time =    5058.76 ms /    91 runs   (   55.59 ms per token,    17.99 tokens per second)

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 20, 2025
@mediouni-m mediouni-m changed the title Initial Hexagon v68 support boilerplate ggml-hexagon: Initial Hexagon v68 support boilerplate Nov 20, 2025
@mediouni-m
Copy link
Author

mediouni-m commented Nov 20, 2025

edit: worked around

for reference, the observed crash traces:

adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: ############################### Process on cDSP CRASHED!!!!!!! ########################################
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: --------------------- Crash Details are furnished below ------------------------------------
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: Process "/frpc/f0491850 test-backend-op" crashed in thread "" for unknown reason
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: Crashed Shared Object "./libggml-htp-v68.so" load address : 0x20010000 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: fastrpc_shell_unsigned_3 load address : D00000  and size : FD434 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: Fault PC   :    0x20020044 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: LR         :    0x2001FFB0 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: SP         :    0xE32D20 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: Bad VA     :    0xFEC02000 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: FP         :    0xE32D28 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: SSR        :    0x1970428 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: Error code :    0x428 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: Call trace : 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: [<2001FFB0>] op_matmul_id+0x4148:     (./libggml-htp-v68.so) 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: [<2001FFB0>] op_matmul_id+0x4148:     (./libggml-htp-v68.so) 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: [<20020390>] op_matmul_id+0x4528:     (./libggml-htp-v68.so) 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: [<2001BB20>] op_matmul+0xC30:     (./libggml-htp-v68.so) 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: [<20015EB0>] worker_pool_run_jobs+0x250:     (./libggml-htp-v68.so) 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: [<20015FF4>] worker_pool_run_func+0xF4:     (./libggml-htp-v68.so) 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: [<2001B97C>] op_matmul+0xA8C:     (./libggml-htp-v68.so) 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: [<200139C4>] htp_iface_stop+0x710:     (./libggml-htp-v68.so) 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: [<20012B14>] htp_iface_start+0x708:     (./libggml-htp-v68.so) 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: [<20020044>] op_matmul_id+0x41DC:     (./libggml-htp-v68.so) 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: ----------------------------- End of Crash Report --------------------------------------------------
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x191:1: Please refer to Hexagon SDK documentation "<HEXAGON_SDK_ROOT>/docs/tools/debug.html" for debugging the user PD exceptions.
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: ############################### Process on cDSP CRASHED!!!!!!! ########################################
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: --------------------- Crash Details are furnished below ------------------------------------
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: Process "/frpc/f0491850 test-backend-op" crashed in thread "0x  e54b30:work" because Application called qurt_exit()
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: Crashed Shared Object "./libggml-htp-v68.so" load address : 0x20010000 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: fastrpc_shell_unsigned_3 load address : D00000  and size : FD434 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: Fault PC   :    0xD270FC 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: LR         :    0x20026758 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: SP         :    0xE54700 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: Bad VA     :    0x0 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: FP         :    0xE548A8 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: SSR        :    0x1970427 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: Error code :    0x108 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: Call trace : 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: [<20026758>] op_binary+0x1D58:     (./libggml-htp-v68.so) 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: [<200259BC>] op_binary+0xFBC:     (./libggml-htp-v68.so) 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: [<2002521C>] op_binary+0x81C:     (./libggml-htp-v68.so) 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: [<00D270FC>] qurt_exception_raise_nonfatal+0x4:     (fastrpc_shell_unsigned_3) 
adsprpc:dsp: CDSP:platform_qdi_driver.c:792:0x192:1: ----------------------------- End of Crash Report --------------------------------------------------

The latter fault goes away if given GGML_HEXAGON_NHVX=1 ... maybe it's VTCM memory management related?

For the former, it's an issue with the currently filled DMA descriptors on v68. There's a dearth of public documentation about those.

A question (cc @max-krasnyansky): are there DMA docs around somewhere? And is anything regarding the v1 DMA descriptors that isn't supported on v68? Should I use the v0 ones there? The v1 descriptors do work but I had to disable bypass on destination

@mediouni-m mediouni-m changed the title ggml-hexagon: Initial Hexagon v68 support boilerplate ggml-hexagon: Initial Hexagon v68 support Nov 21, 2025
Add stdexcept include to fix GCC build errors

Signed-off-by: Mohamed Mediouni <[email protected]>
v68 is the Hexagon revision notably used on the Snapdragon 8cx
Gen 3 and the QCM6490.

Signed-off-by: Mohamed Mediouni <[email protected]>
It turns out that the reason why HAP_compute_res_attr_set_vtcm_param_v2
errored out is that 8MB isn't a supported page size.

Signed-off-by: Mohamed Mediouni <[email protected]>
At least on v68 this made things actually work... not a proper fix though, so to look at later...

Signed-off-by: Mohamed Mediouni <[email protected]>
@max-krasnyansky
Copy link
Collaborator

@mediouni-m It's very cool that you got v68 to work!
Originally, we decided not to support it because the performance is not going to be that good.
Those missing int32 -> fp32 conversion instructions and other things do add up.
The changes you added are clean and small though. I don't mind merging this since it's functional.
Perhaps, it'll be useful for running tiny models, and there are more general optimizations coming that would help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants