Releases · ngxson/llama.cpp

11 Oct 11:29

97870e6

b6732

cuda : avoid initializing unused devices (#16510)

Assets 15

10 Oct 19:44

github-actions

b6730

e60f01d

b6730

server : fix division by zero when reporting stats (#16501)

Assets 15

10 Oct 14:55

github-actions

b6729

81086cd

b6729

vocab : mark EOT token for Granite models (#16499)

* vocab : mark EOT token for Granite models

* sampling : fallback to EOS when EOT is not found

Assets 15

10 Oct 14:45

github-actions

b6728

68ee98a

b6728

server : return HTTP 400 if prompt exceeds context length (#16486)

In streaming mode when prompt exceeds context length, the server returns
HTTP 200 status code with a JSON error in the body.  This is very
confusing and inconsistent with all other inference engines which return
HTTP 4xx error in this case.

This patch fixes this problem and makes the server return HTTP 400 in
such cases.

Assets 15

10 Oct 10:48

github-actions

b6727

cdb6da4

b6727

server : log requests to /v1/completions (#16495)

Assets 15

10 Oct 08:39

github-actions

b6726

6d69ab3

b6726

cmake : Dont define XOPENSOURCE on AIX (#16481)

Assets 15

09 Oct 19:36

github-actions

b6724

1deee0f

b6724

cpu : optimize the ggml NORM operation (#15953)

* ggml-cpu: optimize norm operation to use intrinsics or Accelerate

          rename function

          add endif macro comment

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Aaron Teo <[email protected]>

* implement s390x SIMD suggested by @taronaeo

* add TODO comment

* tidy up spaces

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Aaron Teo <[email protected]>

Assets 15

09 Oct 13:01

github-actions

b6721

56b4795

b6721

model-conversion : add support for SentenceTransformers (#16387)

* model-conversion : add support for SentenceTransformers

This commit adds support for models that use SentenceTransformer layers.

The motivation for this is that if converted model includes any of the
numbered layers specified in the original models repository then these
changes enable these models to be used and verified. Currently the
model-conversion only support the base model output without any of
the additional transformation layers.

Usage:
Convert the model that also includes the SentenceTransformer layers:
```console
(venv) $ export EMBEDDING_MODEL_PATH="~/google/embeddinggemma-300M"
(venv) make embedding-convert-model
```

Verify the produced embeddings from the converted model against the
original model embeddings:
```console
(venv) make embedding-verify-logits-st
```

The original model can be run using SentenceTransformer:
```console
(venv) make embedding-run-original-model-st
```

Run the converted model using "SentenceTransformer" layers whic
enables pooling and normalization:
```console
(venv) make embedding-run-converted-model-st
```

* add model-conversion example requirements

* add support for -st flag in embedding model conversion

This commit add support for the -st flag in the embedding model
conversion script. This will enable models to be converted using
sentence transformers dense layers.

Assets 15

09 Oct 08:14

github-actions

b6719

aa4711d

b6719

CANN: Improve ACL graph matching (#16166)

* CANN: improve ACL graph matching

Record `ne` and `nb` information for src tensors and include them in the
graph matching check. This enhances the robustness of ACL graph matching
by preventing incorrect matches when src tensors share the same data
address but differ in shape or stride.

* CANN: add op_params match

Assets 15

09 Oct 07:50

github-actions

b6718

d80d6d2

b6718

kleidiai: kernel interface refactoring (#16460)

Assets 15

Releases: ngxson/llama.cpp

b6732

Uh oh!

b6730

Uh oh!

b6729

Uh oh!

b6728

Uh oh!

b6727

Uh oh!

b6726

Uh oh!

b6724

Uh oh!

b6721

Uh oh!

b6719

Uh oh!

b6718

Uh oh!