Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b6732
cuda : avoid initializing unused devices (#16510)
b6730
server : fix division by zero when reporting stats (#16501)
b6729
vocab : mark EOT token for Granite models (#16499) * vocab : mark EOT token for Granite models * sampling : fallback to EOS when EOT is not found
b6728
server : return HTTP 400 if prompt exceeds context length (#16486) In streaming mode when prompt exceeds context length, the server returns HTTP 200 status code with a JSON error in the body. This is very confusing and inconsistent with all other inference engines which return HTTP 4xx error in this case. This patch fixes this problem and makes the server return HTTP 400 in such cases.
b6727
server : log requests to /v1/completions (#16495)
b6726
cmake : Dont define XOPENSOURCE on AIX (#16481)
b6724
cpu : optimize the ggml NORM operation (#15953)
* ggml-cpu: optimize norm operation to use intrinsics or Accelerate
rename function
add endif macro comment
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Aaron Teo <[email protected]>
* implement s390x SIMD suggested by @taronaeo
* add TODO comment
* tidy up spaces
---------
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Aaron Teo <[email protected]>
b6721
model-conversion : add support for SentenceTransformers (#16387) * model-conversion : add support for SentenceTransformers This commit adds support for models that use SentenceTransformer layers. The motivation for this is that if converted model includes any of the numbered layers specified in the original models repository then these changes enable these models to be used and verified. Currently the model-conversion only support the base model output without any of the additional transformation layers. Usage: Convert the model that also includes the SentenceTransformer layers: ```console (venv) $ export EMBEDDING_MODEL_PATH="~/google/embeddinggemma-300M" (venv) make embedding-convert-model ``` Verify the produced embeddings from the converted model against the original model embeddings: ```console (venv) make embedding-verify-logits-st ``` The original model can be run using SentenceTransformer: ```console (venv) make embedding-run-original-model-st ``` Run the converted model using "SentenceTransformer" layers whic enables pooling and normalization: ```console (venv) make embedding-run-converted-model-st ``` * add model-conversion example requirements * add support for -st flag in embedding model conversion This commit add support for the -st flag in the embedding model conversion script. This will enable models to be converted using sentence transformers dense layers.
b6719
CANN: Improve ACL graph matching (#16166) * CANN: improve ACL graph matching Record `ne` and `nb` information for src tensors and include them in the graph matching check. This enhances the robustness of ACL graph matching by preventing incorrect matches when src tensors share the same data address but differ in shape or stride. * CANN: add op_params match
b6718
kleidiai: kernel interface refactoring (#16460)