Add support for chunked request bodies (llama-swap compatibility) #154

reznej · 2025-05-05T13:02:25Z

Motivation

I use mlx-lm in combination with Open WebUI. One issue I've encountered is that once a model is loaded into memory, it stays resident indefinitely—even after it's no longer in use. Since I also rely on this machine for other productive tasks, I wanted to enable automatic memory release after inactivity, similar to the keep_alive behavior in Ollama.

During my research, I came across Llama-Swap, a lightweight proxy to manage different models. One of its features is a ttl config parameter, which unloads the model after a specified idle timeout—freeing up system resources automatically.

Problem

However, while integrating llama-swap into my setup, I discovered that mlx-lm (as of v0.24.0) is not compatible with transfer-Encoding: chunked, which seemingly llama-swap uses for proxying HTTP requests. This causes inference requests to fail with a 502 due to missing content-Length.

Solution

This PR introduces a minimal addition to the do_POST method in server.py, enabling mlx-lm to decode chunked HTTP request bodies. It preserves existing behavior for requests with a content-length header. Therefore, this change should have no effect on current usage patterns but enables broader compatibility with toolchains that rely on chunked streaming.

yihongang · 2025-05-05T15:51:42Z

I think llama-swap wants to use content-length, but it's just broken at the moment. There's an open PR to fix it, but we may take some time to agree on approach.

handle chunked requests (llama-swap compatibility)

19f6863

reznej changed the title ~~handle chunked requests (llama-swap compatibility)~~ Add support for chunked request bodies (llama-swap compatibility) May 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for chunked request bodies (llama-swap compatibility) #154

Add support for chunked request bodies (llama-swap compatibility) #154

Uh oh!

reznej commented May 5, 2025

Uh oh!

yihongang commented May 5, 2025

Uh oh!

Uh oh!

Add support for chunked request bodies (llama-swap compatibility) #154

Are you sure you want to change the base?

Add support for chunked request bodies (llama-swap compatibility) #154

Uh oh!

Conversation

reznej commented May 5, 2025

Motivation

Problem

Solution

Uh oh!

yihongang commented May 5, 2025

Uh oh!

Uh oh!