Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt Caching, Retries & Backoff For Anthropic Augmented LLM #46

Open
MattMorgis opened this issue Mar 9, 2025 · 3 comments
Open

Prompt Caching, Retries & Backoff For Anthropic Augmented LLM #46

MattMorgis opened this issue Mar 9, 2025 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@MattMorgis
Copy link
Contributor

I was getting hit with rate limiting errors even when reduced to just one agent running.

I have a branch where I added retry & backoff logic when helped it run to completion.

However I noticed in the logs that cache_creation_input_tokens and cache_read_input_tokens were always 0. And the tools, system prompt, and previous iterations of the conversations are duplicated across requests.

Today I started a new branch and started to mess with implementing prompt caching.

By caching the tools, system prompt, and each iteration of the conversation, the cost went down drastically per run of my workflow and it runs significantly faster now. I also didn't even need the backoffs & retries once I added prompt caching (though still good to have!)

Thoughts? I could submit PRs for one or both.

@saqadri
Copy link
Collaborator

saqadri commented Mar 10, 2025

@MattMorgis would love to add prompt and tool caching. Perhaps that should be something that's configurable in RequestParams. I imagine every provider will handle caching differently, but having a single type to control it would be awesome (even if it's only implemented for Anthropic to start).

Backoff and retry also seems useful to have, though I'd recommmend submitting it in a separate PR.

Excellent ideas on both counts, thank you for suggesting this and offering to implement it. I'll assign this issue to you but please let me know if you'd like any help!

@saqadri saqadri added the enhancement New feature or request label Mar 10, 2025
@recallfuture
Copy link

I found a simply way to enable prompt cache, cache cannot hit is because system prompt added to messages in every turn, even it's already in messages history, which break the prompt cache hit.

see

system_prompt = self.instruction or params.systemPrompt
if system_prompt:
messages.append(
ChatCompletionSystemMessageParam(role="system", content=system_prompt)
)
if params.use_history:
messages.extend(self.history.get())

and can be fix with this:

        if params.use_history:
            messages.extend(self.history.get())

        system_prompt = self.instruction or params.systemPrompt
        if system_prompt and len(messages) == 0:
            messages.append(
                ChatCompletionSystemMessageParam(role="system", content=system_prompt)
            )

@saqadri
Copy link
Collaborator

saqadri commented Mar 28, 2025

I found a simply way to enable prompt cache, cache cannot hit is because system prompt added to messages in every turn, even it's already in messages history, which break the prompt cache hit.

see

mcp-agent/src/mcp_agent/workflows/llm/augmented_llm_openai.py

Lines 114 to 121 in bffbeb3

system_prompt = self.instruction or params.systemPrompt
if system_prompt:
messages.append(
ChatCompletionSystemMessageParam(role="system", content=system_prompt)
)

if params.use_history:
messages.extend(self.history.get())
and can be fix with this:

    if params.use_history:
        messages.extend(self.history.get())

    system_prompt = self.instruction or params.systemPrompt
    if system_prompt and len(messages) == 0:
        messages.append(
            ChatCompletionSystemMessageParam(role="system", content=system_prompt)
        )

@recallfuture thanks for this find! Would you create a PR into the repo? Seems like a possible bug that can be fixed (though it doesn't address all the other points in this Issue so we can leave this issue open as well)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants