-
Notifications
You must be signed in to change notification settings - Fork 160
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
I noticed that even without thinking tool calls for example in Roo Code often fail and get rendered as XML.
I searching why this could be happening, I found this: "Prevent special token leakage in KimiK2ToolParser streaming mode" vllm-project/vllm#28543 - it suggests the issue may be needed to be handled at the backend level.
I also found a bug report at llama.cpp: "Kimi-K2-Thinking reasoning and tool calling support" ggml-org/llama.cpp#17155 - but I am not sure if it referes to the same issue, because there ggml-org/llama.cpp#16932 is mention as a possible solution, but it does not list Kimi K2 Thinking as a supported model, only Kimi K2, which already was working fine for me.
I would appreciate if someone more knowledgable would look into this, does ik_llama.cpp need an update for tool colling handling? Or is it an issue for agents like Roo Code to solve?
Motivation
Kimi K2 Thinking relies a lot on tool calling (according to the official model card):
Deep Thinking & Tool Orchestration: End-to-end trained to interleave chain-of-thought reasoning with function calls, enabling autonomous research, coding, and writing workflows that last hundreds of steps without drift.
The issue described here happens even with thinking disabled though, and prevents Roo Code and other similar agents from working.
If turns out issues I pointed out are not in ik_llama.cpp but in the agent tool calling handling, please feel free to close this. But seeing how some possibly related issues are being considered to be addressed at the backend level in vllm and llama.cpp, I thought it is worth reporting here and sharing relevant information I found about it.
Possible Implementation
No response