-
Notifications
You must be signed in to change notification settings - Fork 13.8k
ggml-hexagon: Initial Hexagon v68 support #17394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
edit: worked around for reference, the observed crash traces: The latter fault goes away if given GGML_HEXAGON_NHVX=1 ... maybe it's VTCM memory management related? For the former, it's an issue with the currently filled DMA descriptors on v68. There's a dearth of public documentation about those. A question (cc @max-krasnyansky): are there DMA docs around somewhere? And is anything regarding the v1 DMA descriptors that isn't supported on v68? |
58c10ff to
b0c9a9f
Compare
Add stdexcept include to fix GCC build errors Signed-off-by: Mohamed Mediouni <[email protected]>
v68 is the Hexagon revision notably used on the Snapdragon 8cx Gen 3 and the QCM6490. Signed-off-by: Mohamed Mediouni <[email protected]>
It turns out that the reason why HAP_compute_res_attr_set_vtcm_param_v2 errored out is that 8MB isn't a supported page size. Signed-off-by: Mohamed Mediouni <[email protected]>
Signed-off-by: Mohamed Mediouni <[email protected]>
At least on v68 this made things actually work... not a proper fix though, so to look at later... Signed-off-by: Mohamed Mediouni <[email protected]>
b0c9a9f to
f1d38e5
Compare
|
@mediouni-m It's very cool that you got v68 to work! |
Runs models, but really slow
Performance numbers on Llama-3.2-1B-Instruct-q4_0.gguf with ./llama-cli -ngl 999 on Makena:
CPU for comparison:
And with FA on NPU
And on CPU