-
|
Hi I use the LemonadeSDK server on Windows. I am using it in a variety of local and remote apps as a OpenAi compat provider, and the server is on the same wired network. For some reason, Moltis queries the server fine but times out in the middle of the process. I can't get it to the next stage to save it. I was able to do it once, but I wanted to change the local IP for better access. To be honest, it shouldn't unload models from the server to query them; this really increases I/O like crazy. To test 8 models on my Lemonade server, it probably needs to load 100GB during tests since most of them are large models. This seems unnecessary. Why not just query the server directly instead of forcing the working model to be unloaded? I use http://IP:8000/api/v1 for the OpenAI-compatible server LLM. I am guessing that Moltis's timeout range is quitting earlier than expected. I see it is querying the Lemonade server. I have tool-calling models like Qwen Coder, Nemotron, etc. Like I mentioned, I was able to get it to load the models once. You can see in the image that it is now at the querying the second model, but then times out.
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Agreed, fixed in #673 |
Beta Was this translation helpful? Give feedback.

Agreed, fixed in #673