Releases · ngxson/llama.cpp

03 Oct 16:37

606a73f

b6684

metal : fix loop bound in ggml_mem_ranges (#16412)

Assets 15

03 Oct 13:02

github-actions

b6683

946f71e

b6683

llama : fix shapes for bert/mpt q/k norm (#16409)

Assets 15

03 Oct 12:07

github-actions

b6682

638d330

b6682

ggml : fix graph reallocation with multiple chunks (#16396)

reallocation is needed if a single chunk grows in size,
even if total allocation size stays the same or is lower

Assets 15

03 Oct 11:11

github-actions

b6681

84c8e30

b6681

Fix missing messages on sibling navigation (#16408)

* fix: resolve message disappearing issue when navigating between regenerated siblings by using current leaf nodes instead of cached sibling IDs

* chore: update webui build output

* chore: update webui build output

Assets 15

03 Oct 10:54

github-actions

b6679

0e1f838

b6679

vulkan: Fix FA coopmat1 invalid array indexing (#16365)

When computing sinks, the cm1 shader was looping r from 0 to Br rather than
to rows_per_thread. I must have copied this from the scalar path (where it is
correct), and somehow it wasn't causing failures on current drivers.

Assets 15

03 Oct 10:19

github-actions

b6678

ad12647

b6678

ci : change macos-13 to macos-15-intel (#16401)

This commit updates the macos-13 runners to macos-15-intel.

The motivation for this changes is the macos-13 runners are scheduled
to be retired on 2025-12-04.

Refs: https://github.blog/changelog/2025-09-19-github-actions-macos-13-runner-image-is-closing-down/

Assets 15

03 Oct 09:18

github-actions

b6676

e308efd

b6676

vulkan: in flash attention, bounds check against nem1 (don't rely on …

Assets 15

02 Oct 19:59

github-actions

b6673

d64c810

b6673

test-barrier : do not use more threads than physically available (#16…

Assets 15

02 Oct 19:37

github-actions

b6672

ef07a40

b6672

ggml webgpu: add support for soft_max, optimize rms_norm (#16357)

* Add inplace softmax

* Move rms_norm to split row approach

* Update debug for supports_op

* clean up debug statements

* Update tests/test-backend-ops.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 15

02 Oct 19:15

github-actions

b6671

34fcc5a

b6671

model : Apertus model implementation (#15852)

* First attempt

* No permute during convert (fixes qk tensors), proper norm application.

* RoPE = NeoX

* Coherence!

* Migrate xielu params from tensors to hyperparameters

* Simple CUDA kernel

* Revert stupid LLM refactorings

* Chat template support

* configchecker / flake8 errors

* Reorder unary.cu

* I do conclude that LLMs are, in fact, stupid.

* Fix after merge

* Final newline

* Make xIELU an UNARY_OP

* Final newline

* Correctly account for parameter shift

* Argh.

* Update ggml/src/ggml-cpu/unary-ops.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

* Refactor: remove unused methods, inline and factorize softplus, add const modifiers

* Revert CUDA changes, implement xIELU as a separate OP

* Pesky newline

* Add float2half / half2float for F16 inputs/outputs

* CUDA variants, attempt 2

* Actually, attempt 3

* Update ggml/src/ggml-cuda/unary.cu

Co-authored-by: Johannes Gäßler <[email protected]>

* Missing convert header

* Proper formula and reference for xIELU in the comments.

* Modify unary-ops.cpp to add the functor-based logic besides the template system to retain optimizations

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Add tensor mappings for Apertus to global list instead

* Fix lazy on scalars

* Update ggml/src/ggml-cuda/unary.cu

Co-authored-by: Johannes Gäßler <[email protected]>

* Add comment about the constraints on positive/negative alpha

* Change `softplus` to `ggml_softplus`

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Johannes Gäßler <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

Releases: ngxson/llama.cpp

b6684

Uh oh!

b6683

Uh oh!

b6682

Uh oh!

b6681

Uh oh!

b6679

Uh oh!

b6678

Uh oh!

b6676

Uh oh!

b6673

Uh oh!

b6672

Uh oh!

b6671

Uh oh!