-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prompt Caching, Retries & Backoff For Anthropic Augmented LLM #46
Comments
@MattMorgis would love to add prompt and tool caching. Perhaps that should be something that's configurable in Backoff and retry also seems useful to have, though I'd recommmend submitting it in a separate PR. Excellent ideas on both counts, thank you for suggesting this and offering to implement it. I'll assign this issue to you but please let me know if you'd like any help! |
I found a simply way to enable prompt cache, cache cannot hit is because system prompt added to messages in every turn, even it's already in messages history, which break the prompt cache hit. see mcp-agent/src/mcp_agent/workflows/llm/augmented_llm_openai.py Lines 114 to 121 in bffbeb3
and can be fix with this: if params.use_history:
messages.extend(self.history.get())
system_prompt = self.instruction or params.systemPrompt
if system_prompt and len(messages) == 0:
messages.append(
ChatCompletionSystemMessageParam(role="system", content=system_prompt)
) |
@recallfuture thanks for this find! Would you create a PR into the repo? Seems like a possible bug that can be fixed (though it doesn't address all the other points in this Issue so we can leave this issue open as well) |
I was getting hit with rate limiting errors even when reduced to just one agent running.
I have a branch where I added retry & backoff logic when helped it run to completion.
However I noticed in the logs that
cache_creation_input_tokens
andcache_read_input_tokens
were always0
. And the tools, system prompt, and previous iterations of the conversations are duplicated across requests.Today I started a new branch and started to mess with implementing prompt caching.
By caching the tools, system prompt, and each iteration of the conversation, the cost went down drastically per run of my workflow and it runs significantly faster now. I also didn't even need the backoffs & retries once I added prompt caching (though still good to have!)
Thoughts? I could submit PRs for one or both.
The text was updated successfully, but these errors were encountered: