How to implement prompt caching for sub-agents in Google ADK async workflow? #3368

parthiaitech · 2025-11-03T18:10:43Z

parthiaitech
Nov 3, 2025

I’m working on a multi-agent workflow using Google ADK where a root_agent coordinates three sub-agents.

Here’s a simplified version of how I’ve set up the agents:
SupervisorAgent = LlmAgent(
name="root_agent",
model=LiteLlm(model=llm_model),
description="Coordinates and monitors tasks across sub-agents.",
instruction=instruction,
sub_agents=[sub-agent1, sub-agent2, sub-agent3],
)

The workflow runs asynchronously using:
await runner.run_async(input_data)

Now, I want to implement prompt caching specifically for the sub-agent2 and sub-agent3 to reduce redundant LLM calls and improve performance for repeated or similar tasks.

I have a few questions on how best to integrate caching in this setup:
Does Google ADK currently provide a built-in caching mechanism for LLM prompts/responses? if yes how to implement with this workflow?
If not, what’s the recommended approach to integrate custom caching (e.g., using Redis or Google Cloud Memorystore) within this async agent workflow?

I'm using updated adk version.

adk-bot · 2025-11-03T18:12:11Z

adk-bot
Nov 3, 2025
Maintainer

Response from ADK Answering Agent (experimental, answer may be inaccurate)

TLDR: Google ADK offers a built-in "Context Caching" feature that is likely what you're looking for. It's configured at the App level when you wrap your root agent. For more advanced or custom caching logic using external stores like Redis, the recommended approach is to implement an ADK Plugin.

Hello! Great question on optimizing your multi-agent workflow. Here’s a breakdown of how you can implement caching in ADK.

Built-in Context Caching

Google ADK provides a built-in mechanism called Context Caching. This feature is designed to reduce latency and costs by caching the LLM prompt and response context, which is especially useful for agents with large instructions or in multi-turn scenarios. [1, 2]

You can enable this by wrapping your root agent in an App and passing a ContextCacheConfig object. The caching is configured for the entire application, not for individual sub-agents, but it will cache the context of the entire LLM call. If the interactions with sub-agent2 and sub-agent3 lead to repetitive overall prompt contexts, you will see performance benefits.

Here is how you can apply it to your setup:

from google.adk.agents.context_cache_config import ContextCacheConfig
from google.adk.apps.app import App

# Your existing agent setup
# SupervisorAgent = LlmAgent(...)

# 1. Define the cache configuration
cache_config = ContextCacheConfig(
    min_tokens=4096,    # Only cache if the context is larger than this
    ttl_seconds=600,     # Cache items expire after 10 minutes
    cache_intervals=3    # Force a refresh every 3 invocations
)

# 2. Wrap your root agent in an App with the cache config
supervisor_app = App(
    name="supervisor_app",
    root_agent=SupervisorAgent,
    context_cache_config=cache_config,
)

# 3. Run the app (instead of the agent directly)
# Note: You'll need an AppRunner to run the App
# await runner.run_async(input_data) -> This will now use the App's runner

You can see a full example of this implementation in the ADK samples. [1]

Custom Caching with Plugins

If the built-in context caching doesn't fit your needs and you specifically want to integrate an external cache like Redis or Memorystore, the recommended ADK-native approach is to use Plugins.

Plugins allow you to introduce custom, cross-cutting logic (like caching, logging, or authentication) into the agent's execution lifecycle without modifying the core agent logic. [3] You could create a plugin that:

Hooks into the execution flow before the LLM is called.
Creates a hash or key from the incoming prompt.
Checks your Redis/Memorystore for a cached response using this key.
If a cached response exists, it returns it immediately, skipping the expensive LLM call.
If not, it allows the workflow to proceed and caches the new response on its way out.

This approach gives you full control over the caching strategy and is the standard way to extend ADK with custom behaviors.

[1] ADK Cache Analysis Sample Agent
[2] ADK Cache Analysis README
[3] ADK Plugins Documentation

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to implement prompt caching for sub-agents in Google ADK async workflow? #3368

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to implement prompt caching for sub-agents in Google ADK async workflow? #3368

Uh oh!

parthiaitech Nov 3, 2025

Replies: 1 comment

Uh oh!

adk-bot Nov 3, 2025 Maintainer

Built-in Context Caching

Custom Caching with Plugins

parthiaitech
Nov 3, 2025

adk-bot
Nov 3, 2025
Maintainer