Prompt Caching, Retries & Backoff For Anthropic Augmented LLM #46

MattMorgis · 2025-03-09T03:35:01Z

I was getting hit with rate limiting errors even when reduced to just one agent running.

I have a branch where I added retry & backoff logic when helped it run to completion.

However I noticed in the logs that cache_creation_input_tokens and cache_read_input_tokens were always 0. And the tools, system prompt, and previous iterations of the conversations are duplicated across requests.

Today I started a new branch and started to mess with implementing prompt caching.

By caching the tools, system prompt, and each iteration of the conversation, the cost went down drastically per run of my workflow and it runs significantly faster now. I also didn't even need the backoffs & retries once I added prompt caching (though still good to have!)

Thoughts? I could submit PRs for one or both.

The text was updated successfully, but these errors were encountered:

saqadri · 2025-03-10T14:32:13Z

@MattMorgis would love to add prompt and tool caching. Perhaps that should be something that's configurable in RequestParams. I imagine every provider will handle caching differently, but having a single type to control it would be awesome (even if it's only implemented for Anthropic to start).

Backoff and retry also seems useful to have, though I'd recommmend submitting it in a separate PR.

Excellent ideas on both counts, thank you for suggesting this and offering to implement it. I'll assign this issue to you but please let me know if you'd like any help!

recallfuture · 2025-03-28T04:39:04Z

I found a simply way to enable prompt cache, cache cannot hit is because system prompt added to messages in every turn, even it's already in messages history, which break the prompt cache hit.

see

mcp-agent/src/mcp_agent/workflows/llm/augmented_llm_openai.py

Lines 114 to 121 in bffbeb3

    
           system_prompt = self.instruction or params.systemPrompt 
        
           if system_prompt: 
        
               messages.append( 
        
                   ChatCompletionSystemMessageParam(role="system", content=system_prompt) 
        
               ) 
        
           if params.use_history: 
        
               messages.extend(self.history.get())

and can be fix with this:

        if params.use_history:
            messages.extend(self.history.get())

        system_prompt = self.instruction or params.systemPrompt
        if system_prompt and len(messages) == 0:
            messages.append(
                ChatCompletionSystemMessageParam(role="system", content=system_prompt)
            )

saqadri · 2025-03-28T11:08:05Z

I found a simply way to enable prompt cache, cache cannot hit is because system prompt added to messages in every turn, even it's already in messages history, which break the prompt cache hit.

see

mcp-agent/src/mcp_agent/workflows/llm/augmented_llm_openai.py

Lines 114 to 121 in bffbeb3

system_prompt = self.instruction or params.systemPrompt
if system_prompt:
messages.append(
ChatCompletionSystemMessageParam(role="system", content=system_prompt)
)

if params.use_history:
messages.extend(self.history.get())
and can be fix with this:
    if params.use_history:
        messages.extend(self.history.get())

    system_prompt = self.instruction or params.systemPrompt
    if system_prompt and len(messages) == 0:
        messages.append(
            ChatCompletionSystemMessageParam(role="system", content=system_prompt)
        )

@recallfuture thanks for this find! Would you create a PR into the repo? Seems like a possible bug that can be fixed (though it doesn't address all the other points in this Issue so we can leave this issue open as well)

saqadri assigned MattMorgis Mar 10, 2025

saqadri added the enhancement New feature or request label Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt Caching, Retries & Backoff For Anthropic Augmented LLM #46

Prompt Caching, Retries & Backoff For Anthropic Augmented LLM #46

MattMorgis commented Mar 9, 2025

saqadri commented Mar 10, 2025

recallfuture commented Mar 28, 2025

saqadri commented Mar 28, 2025

Prompt Caching, Retries & Backoff For Anthropic Augmented LLM #46

Prompt Caching, Retries & Backoff For Anthropic Augmented LLM #46

Comments

MattMorgis commented Mar 9, 2025

saqadri commented Mar 10, 2025

recallfuture commented Mar 28, 2025

saqadri commented Mar 28, 2025