-
Notifications
You must be signed in to change notification settings - Fork 4.1k
feat: Add caching responses for agents #4363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great one!
except Exception as e: | ||
|
||
log_warning(f"Error checking model cache: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except Exception as e: | |
log_warning(f"Error checking model cache: {e}") | |
except Exception as e: | |
log_warning(f"Error checking model response cache: {e}") |
|
||
log_debug(f"Streaming responses cached ({len(streaming_responses)} chunks) to: {cache_file}") | ||
except Exception as e: | ||
log_error(f"Error writing streaming cache: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of streaming I would just cache the complete response. And also just deliver that - no need to deliver chunks if we have them all already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
# Time-to-live for cached model responses in seconds | ||
cache_model_ttl: int = 3600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would default to never expiring the cache (cache_model_ttl: int = 0
)
Con is potential memory growth. Your call
from agno.models.openai.chat import OpenAIChat | ||
|
||
agent = Agent( | ||
model=OpenAIChat(id="o3-mini"), cache_model_response=True, debug_mode=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should cache_model_response
not be a param of the model class instead?
functions=self._functions_for_model, | ||
tool_choice=self.tool_choice, | ||
tool_call_limit=self.tool_call_limit, | ||
model_response = self._get_model_response_with_cache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the default behaviour should continue to be self.model.response
but if caching is enabled, we should call this function.
Either way the function name here is misleading
Summary
This adds caching of LLM responses as an opt-in for users. Very useful during development
Type of change
Checklist
./scripts/format.sh
and./scripts/validate.sh
)Additional Notes
Add any important context (deployment instructions, screenshots, security considerations, etc.)