Releases: syther-labs/llama.cpp
Releases · syther-labs/llama.cpp
b4611
b4609
Implement s3:// protocol (#11511) For those that want to pull from s3 Signed-off-by: Eric Curtin <[email protected]>
b4607
`ci`: use sccache on windows instead of ccache (#11545) * Use sccache on ci for windows * Detect sccache in cmake
b4604
Fix chatml fallback for unsupported builtin templates (when --jinja n…
b4601
server : update help metrics processing/deferred (#11512) This commit updates the help text for the metrics `requests_processing` and `requests_deferred` to be more grammatically correct. Currently the returned metrics look like this: ```console \# HELP llamacpp:requests_processing Number of request processing. \# TYPE llamacpp:requests_processing gauge llamacpp:requests_processing 0 \# HELP llamacpp:requests_deferred Number of request deferred. \# TYPE llamacpp:requests_deferred gauge llamacpp:requests_deferred 0 ``` With this commit, the metrics will look like this: ```console \# HELP llamacpp:requests_processing Number of requests processing. \# TYPE llamacpp:requests_processing gauge llamacpp:requests_processing 0 \# HELP llamacpp:requests_deferred Number of requests deferred. \# TYPE llamacpp:requests_deferred gauge llamacpp:requests_deferred 0 ``` This is also consistent with the description of the metrics in the server examples [README.md](https://github.com/ggerganov/llama.cpp/tree/master/examples/server#get-metrics-prometheus-compatible-metrics-exporter).
b4600
`ci`: ccache for all github worfklows (#11516)
b4598
HIP: require at least HIP 5.5
b4595
sync: minja (#11499)
b4589
server : add /apply-template endpoint for additional use cases of Min…
b4588
vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360) * vulkan: initial support for IQ3_S * vulkan: initial support for IQ3_XXS * vulkan: initial support for IQ2_XXS * vulkan: initial support for IQ2_XS * vulkan: optimize Q3_K by removing branches * vulkan: implement dequantize variants for coopmat2 * vulkan: initial support for IQ2_S * vulkan: vertically realign code * port failing dequant callbacks from mul_mm * Fix array length mismatches * vulkan: avoid using workgroup size before it is referenced * tests: increase timeout for Vulkan llvmpipe backend --------- Co-authored-by: Jeff Bolz <[email protected]>