feat(tools): add view_image tool for LLM visual analysis#1526
feat(tools): add view_image tool for LLM visual analysis#1526xieyxclack merged 2 commits intoagentscope-ai:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the multimodal capabilities of the system by introducing a new Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a view_image tool, allowing the LLM to analyze local image files. The implementation is well-structured, including a new display_to_user configuration to control tool output visibility and a model_validator to seamlessly update existing user configurations with the new tool. The changes are thoughtful and improve the multimodal capabilities of the agent.
I've found a potential issue related to mutable default objects in the configuration, which could lead to unexpected side effects. I've left two comments with suggestions to improve the robustness of the configuration handling by ensuring deep copies are used for default tool configurations. Other than that, the changes look great.
src/copaw/config/config.py
Outdated
| description="Get llm token usage", | ||
| ), | ||
| }, | ||
| default_factory=lambda: dict(_DEFAULT_BUILTIN_TOOLS), |
There was a problem hiding this comment.
The default_factory creates a shallow copy of _DEFAULT_BUILTIN_TOOLS. This means the BuiltinToolConfig objects are shared between the global default and new ToolsConfig instances. If one of these objects is modified in-place, it will mutate the global default, leading to unexpected side effects for other parts of the application that rely on these defaults. To prevent this, you should create a deep copy of the tool configurations using model_copy().
| default_factory=lambda: dict(_DEFAULT_BUILTIN_TOOLS), | |
| default_factory=lambda: {k: v.model_copy() for k, v in _DEFAULT_BUILTIN_TOOLS.items()}, |
src/copaw/config/config.py
Outdated
| """Ensure new code-defined tools are present in saved configs.""" | ||
| for name, default_tc in _DEFAULT_BUILTIN_TOOLS.items(): | ||
| if name not in self.builtin_tools: | ||
| self.builtin_tools[name] = default_tc |
There was a problem hiding this comment.
Similar to the issue with default_factory, assigning default_tc directly shares the BuiltinToolConfig object from the global _DEFAULT_BUILTIN_TOOLS. If this object is later modified (e.g., through a settings UI), it will alter the global default state. You should assign a copy of the object using model_copy() to ensure that each configuration is independent.
| self.builtin_tools[name] = default_tc | |
| self.builtin_tools[name] = default_tc.model_copy() |
da11440 to
d2d5f0f
Compare
Description
Add
view_imagetool that loads local image files into LLM context for visual analysis. Enables multimodal models to actually "see" local images or images produced bydesktop_screenshot,browser_use, or any tool returning file paths.Four key changes:
view_imagetool . A new built-in tool that returnsImageBlockwith local path. Validates file existence and image MIME type.display_to_userper-tool control . NewBuiltinToolConfig.display_to_userfield +internal_toolsmechanism in renderer. Preventsview_imagemedia from leaking to user channels whenshow_tool_details=False, while still allowingsend_file_to_usermedia through.model_validatorto auto-sync new tools into existing user configs.promote_tool_result_images. Enabled for OpenAI-compatible formatters so tool result images are promoted to user messages and base64-encoded at format time (matching channel image behavior, memory-efficient).Related Issue: Fixes #1097, #918, #820
Type of Change
Component(s) Affected
Checklist
pre-commit run --all-fileslocally and it passespytestor as relevant) and they passTesting
Pass unit test.
To test
view_image_tooldesktop_screenshot→view_image→ correctly describe screen contentAdditional Notes
view_imageuses plain local path (str(resolved)) instead offile://URI — required becausepromote_tool_result_imagescreates new ImageBlocks after thefile://monkey patch runsview_imageauto-added to config viamodel_validatorwithdisplay_to_user: falsedisplay_to_userto True (by default False). An UI design for built-in tool may be beneficial, but not currently supported.