Model query times out in the middle of it #196

kursad-k · 2026-02-21T01:54:08Z

kursad-k
Feb 21, 2026

Hi

I use the LemonadeSDK server on Windows. I am using it in a variety of local and remote apps as a OpenAi compat provider, and the server is on the same wired network. For some reason, Moltis queries the server fine but times out in the middle of the process. I can't get it to the next stage to save it. I was able to do it once, but I wanted to change the local IP for better access.

To be honest, it shouldn't unload models from the server to query them; this really increases I/O like crazy. To test 8 models on my Lemonade server, it probably needs to load 100GB during tests since most of them are large models. This seems unnecessary. Why not just query the server directly instead of forcing the working model to be unloaded?

I use http://IP:8000/api/v1 for the OpenAI-compatible server LLM. I am guessing that Moltis's timeout range is quitting earlier than expected.

2026-02-21T01:46:36.971461Z  INFO moltis_gateway::server: host package provisioning complete installed=0 skipped=65 sudo=true
2026-02-21T01:46:47.218493Z  INFO moltis_gateway::provider_setup: provider setup operation started operation="providers.validate_key" provider=custom-IP
2026-02-21T01:46:47.229086Z  INFO moltis_agents::providers: registered custom OpenAI-compatible provider provider=custom-IP
2026-02-21T01:46:47.246103Z  INFO moltis_agents::providers::local_gguf: local-llm system info total_ram_gb=31 available_ram_gb=28 has_metal=false has_cuda=false tier=medium (16GB)
2026-02-21T01:46:47.246163Z  INFO moltis_agents::providers::local_gguf: local-llm model cache directory cache_dir=/home/moltis/.moltis/models
2026-02-21T01:46:47.246200Z  INFO moltis_agents::providers::local_gguf: suggested local model for your system model="codestral-22b-q4_k_m" display_name="Codestral 22B (Q4_K_M)" min_ram_gb=16 backend=GGUF
2026-02-21T01:46:47.246251Z  INFO moltis_agents::providers::local_gguf: cached local models in model cache directory cached_models=[] cached_count=0
2026-02-21T01:46:47.246276Z  INFO moltis_agents::providers: local-llm enabled but no models configured. Add [providers.local] models = ["..."] to config.
2026-02-21T01:46:47.246297Z  INFO moltis_gateway::provider_setup: provider validation discovered candidate models for probing provider=custom-IP model_count=16
2026-02-21T01:46:47.246363Z  INFO moltis_gateway::provider_setup: provider validation model probe started provider=custom-IP model=custom-IP::DeepSeek-Qwen3-8B-GGUF attempt=1 total_models=16
2026-02-21T01:46:57.247615Z  WARN moltis_gateway::provider_setup: provider validation model probe timed out provider=custom-IP model=custom-IP::DeepSeek-Qwen3-8B-GGUF timeout_count=1 max_timeouts=2 elapsed_ms=10001
2026-02-21T01:46:57.247741Z  INFO moltis_gateway::provider_setup: provider validation model probe started provider=custom-IP model=custom-IP::Gemma-3-4b-it-GGUF attempt=2 total_models=16
2026-02-21T01:47:07.248975Z  WARN moltis_gateway::provider_setup: provider validation model probe timed out provider=custom-IP model=custom-IP::Gemma-3-4b-it-GGUF timeout_count=2 max_timeouts=2 elapsed_ms=10001
2026-02-21T01:47:07.249076Z  INFO moltis_gateway::provider_setup: provider setup operation finished operation="providers.validate_key" provider=custom-IP elapsed_ms=20030

I see it is querying the Lemonade server. I have tool-calling models like Qwen Coder, Nemotron, etc. Like I mentioned, I was able to get it to load the models once.

You can see in the image that it is now at the querying the second model, but then times out.


Hello<end_of_turn>
<start_of_turn>model
Hi there<end_of_turn>
<start_of_turn>user
How are you?<end_of_turn>
<start_of_turn>model
'
srv          init: init: chat template, thinking = 0
main: model loaded
main: server is listening on http://127.0.0.1:8001
main: starting the main loop...
srv  update_slots: all slots are idle
llama-server is ready!
[LlamaCpp] Model loaded on port 8001
[Router] Backend started successfully
[Router] Model loaded successfully. Total loaded: 1
[Server] Model loaded successfully: Gemma-3-4b-it-GGUF
[Server] POST /api/v1/chat/completions - srv  params_from_: Chat format: Content-only
slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  3 | task 0 | processing task, is_child = 0
slot update_slots: id  3 | task 0 | new prompt, n_ctx_slot = 98304, n_keep = 16, task.n_tokens = 10
slot update_slots: id  3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 10, batch.n_tokens = 10, progress = 1.000000
slot update_slots: id  3 | task 0 | prompt done, n_tokens = 10, batch.n_tokens = 10
slot init_sampler: id  3 | task 0 | init sampler, took 0.00 ms, tokens: text = 10, total = 10
slot print_timing: id  3 | task 0 |
prompt eval time =      13.49 ms /    10 tokens (    1.35 ms per token,   741.40 tokens per second)
       eval time =    2586.59 ms /   730 tokens (    3.54 ms per token,   282.22 tokens per second)
      total time =    2600.08 ms /   740 tokens
slot      release: id  3 | task 0 | stop processing: n_tokens = 739, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
200 OK
[Server DEBUG] Response message does NOT contain tool_calls
[Server DEBUG] Message content: Okay, let's talk about "ping"! Here's a breakdown of what it is, how it works, and why you might use it:

**What is Ping?**

"Ping" is a basic network diagnostic tool. It's a command-line utility (oft

=== Telemetry ===

Answered by penso

Apr 12, 2026

Agreed, fixed in #673

View full answer

penso · 2026-04-12T11:27:14Z

penso
Apr 12, 2026
Maintainer

Agreed, fixed in #673

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model query times out in the middle of it #196

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Model query times out in the middle of it #196

Uh oh!

Uh oh!

kursad-k Feb 21, 2026

Replies: 1 comment

Uh oh!

penso Apr 12, 2026 Maintainer

kursad-k
Feb 21, 2026

penso
Apr 12, 2026
Maintainer