Skip to content

Conversation

dirkbrnd
Copy link
Contributor

Summary

This adds caching of LLM responses as an opt-in for users. Very useful during development

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Improvement
  • Model update
  • Other:

Checklist

  • Code complies with style guidelines
  • Ran format/validation scripts (./scripts/format.sh and ./scripts/validate.sh)
  • Self-review completed
  • Documentation updated (comments, docstrings)
  • Examples and guides: Relevant cookbook examples have been included or updated (if applicable)
  • Tested in clean environment
  • Tests added/updated (if applicable)

Additional Notes

Add any important context (deployment instructions, screenshots, security considerations, etc.)

@dirkbrnd dirkbrnd requested a review from a team as a code owner August 28, 2025 09:32
Copy link
Contributor

@manuhortet manuhortet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great one!

Comment on lines +8304 to +8306
except Exception as e:

log_warning(f"Error checking model cache: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except Exception as e:
log_warning(f"Error checking model cache: {e}")
except Exception as e:
log_warning(f"Error checking model response cache: {e}")


log_debug(f"Streaming responses cached ({len(streaming_responses)} chunks) to: {cache_file}")
except Exception as e:
log_error(f"Error writing streaming cache: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of streaming I would just cache the complete response. And also just deliver that - no need to deliver chunks if we have them all already

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +350 to +351
# Time-to-live for cached model responses in seconds
cache_model_ttl: int = 3600
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would default to never expiring the cache (cache_model_ttl: int = 0)

Con is potential memory growth. Your call

from agno.models.openai.chat import OpenAIChat

agent = Agent(
model=OpenAIChat(id="o3-mini"), cache_model_response=True, debug_mode=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cache_model_response not be a param of the model class instead?

functions=self._functions_for_model,
tool_choice=self.tool_choice,
tool_call_limit=self.tool_call_limit,
model_response = self._get_model_response_with_cache(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the default behaviour should continue to be self.model.response but if caching is enabled, we should call this function.

Either way the function name here is misleading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants