Skip to content

Releases: syther-labs/llama.cpp

b4611

02 Feb 00:27
53debe6
Compare
Choose a tag to compare
ci: use sccache on windows HIP jobs (#11553)

b4609

01 Feb 12:34
ecef206
Compare
Choose a tag to compare
Implement s3:// protocol (#11511)

For those that want to pull from s3

Signed-off-by: Eric Curtin <[email protected]>

b4607

31 Jan 18:37
aa6fb13
Compare
Choose a tag to compare
`ci`: use sccache on windows instead of ccache (#11545)

* Use sccache on ci for windows

* Detect sccache in cmake

b4604

31 Jan 12:35
5783575
Compare
Choose a tag to compare
Fix chatml fallback for unsupported builtin templates (when --jinja n…

b4601

31 Jan 06:18
a2df278
Compare
Choose a tag to compare
server : update help metrics processing/deferred (#11512)

This commit updates the help text for the metrics `requests_processing`
and `requests_deferred` to be more grammatically correct.

Currently the returned metrics look like this:
```console
\# HELP llamacpp:requests_processing Number of request processing.
\# TYPE llamacpp:requests_processing gauge
llamacpp:requests_processing 0
\# HELP llamacpp:requests_deferred Number of request deferred.
\# TYPE llamacpp:requests_deferred gauge
llamacpp:requests_deferred 0
```

With this commit, the metrics will look like this:
```console
\# HELP llamacpp:requests_processing Number of requests processing.
\# TYPE llamacpp:requests_processing gauge
llamacpp:requests_processing 0
\# HELP llamacpp:requests_deferred Number of requests deferred.
\# TYPE llamacpp:requests_deferred gauge
llamacpp:requests_deferred 0
```
This is also consistent with the description of the metrics in the
server examples [README.md](https://github.com/ggerganov/llama.cpp/tree/master/examples/server#get-metrics-prometheus-compatible-metrics-exporter).

b4600

31 Jan 00:40
553f1e4
Compare
Choose a tag to compare
`ci`: ccache for all github worfklows (#11516)

b4598

30 Jan 18:38
Compare
Choose a tag to compare
HIP: require at least HIP 5.5

b4595

30 Jan 12:40
3d804de
Compare
Choose a tag to compare
sync: minja (#11499)

b4589

30 Jan 00:32
eb7cf15
Compare
Choose a tag to compare
server : add /apply-template endpoint for additional use cases of Min…

b4588

29 Jan 18:39
66ee4f2
Compare
Choose a tag to compare
vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360)

* vulkan: initial support for IQ3_S

* vulkan: initial support for IQ3_XXS

* vulkan: initial support for IQ2_XXS

* vulkan: initial support for IQ2_XS

* vulkan: optimize Q3_K by removing branches

* vulkan: implement dequantize variants for coopmat2

* vulkan: initial support for IQ2_S

* vulkan: vertically realign code

* port failing dequant callbacks from mul_mm

* Fix array length mismatches

* vulkan: avoid using workgroup size before it is referenced

* tests: increase timeout for Vulkan llvmpipe backend

---------

Co-authored-by: Jeff Bolz <[email protected]>