Skip to content

feat(tools): add view_image tool for LLM visual analysis#1526

Merged
xieyxclack merged 2 commits intoagentscope-ai:mainfrom
Leirunlin:feat/view-image-tool
Mar 16, 2026
Merged

feat(tools): add view_image tool for LLM visual analysis#1526
xieyxclack merged 2 commits intoagentscope-ai:mainfrom
Leirunlin:feat/view-image-tool

Conversation

@Leirunlin
Copy link
Collaborator

Description

Add view_image tool that loads local image files into LLM context for visual analysis. Enables multimodal models to actually "see" local images or images produced by desktop_screenshot, browser_use, or any tool returning file paths.

Four key changes:

  1. view_image tool . A new built-in tool that returns ImageBlock with local path. Validates file existence and image MIME type.
  2. display_to_user per-tool control . New BuiltinToolConfig.display_to_user field + internal_tools mechanism in renderer. Prevents view_image media from leaking to user channels when show_tool_details=False, while still allowing send_file_to_user media through.
  3. Includes model_validator to auto-sync new tools into existing user configs.
  4. promote_tool_result_images. Enabled for OpenAI-compatible formatters so tool result images are promoted to user messages and base64-encoded at format time (matching channel image behavior, memory-efficient).

Related Issue: Fixes #1097, #918, #820

Type of Change

  • New feature

Component(s) Affected

  • Core / Backend (app, agents, config, providers, utils, local_models)
  • Console (frontend web UI)

Checklist

  • I ran pre-commit run --all-files locally and it passes
  • If pre-commit auto-fixed files, I committed those changes and reran checks
  • I ran tests locally (pytest or as relevant) and they pass
  • Documentation updated (if needed)
  • Ready for review

Testing

Pass unit test.
To test view_image_tool

  1. Start CoPaw, ask "take a screenshot and describe what you see"
  2. LLM should call desktop_screenshotview_image → correctly describe screen content
  3. Send an image via Telegram/Dingtalk, confirm LLM still reads it correctly

Additional Notes

  • view_image uses plain local path (str(resolved)) instead of file:// URI — required because promote_tool_result_images creates new ImageBlocks after the file:// monkey patch runs
  • Existing users get view_image auto-added to config via model_validator with display_to_user: false
  • Base64 is forbidden, as too many tokens are introduced to context.
  • To show image path and images, users have to set display_to_user to True (by default False). An UI design for built-in tool may be beneficial, but not currently supported.

@Leirunlin Leirunlin temporarily deployed to maintainer-approved March 15, 2026 12:21 — with GitHub Actions Inactive
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the multimodal capabilities of the system by introducing a new view_image tool, allowing Large Language Models to process and analyze local image files. It also refines the user experience by providing granular control over which tool outputs are displayed to the user, ensuring that internal processing images are not unnecessarily exposed while maintaining the ability to share relevant media. Furthermore, it includes mechanisms for seamless integration of new tools into existing configurations and optimizes image handling for specific LLM formats.

Highlights

  • New view_image tool: A new built-in tool has been added that loads local image files into the LLM context for visual analysis, returning an ImageBlock with the local path. It includes validation for file existence and image MIME type.
  • display_to_user per-tool control: A new display_to_user field in BuiltinToolConfig and an internal_tools mechanism in the renderer prevent media from tools like view_image from being displayed to user channels when show_tool_details=False, while still allowing other media (e.g., from send_file_to_user) to pass through.
  • Auto-sync new tools: A model_validator has been included to automatically synchronize newly defined tools into existing user configurations, ensuring view_image is added with display_to_user: false by default.
  • promote_tool_result_images for OpenAI-compatible formatters: This feature is now enabled for OpenAI-compatible formatters, promoting tool result images to user messages and base64-encoding them at format time, which matches channel image behavior and is memory-efficient.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • console/src/pages/Settings/Security/components/RuleModal.tsx
    • Added "view_image" to the BUILTIN_TOOLS array.
  • console/src/pages/Settings/Security/index.tsx
    • Added "view_image" to the BUILTIN_TOOLS array.
  • src/copaw/agents/model_factory.py
    • Modified _create_formatter_instance to conditionally set promote_tool_result_images=True for OpenAIChatFormatter instances.
  • src/copaw/agents/react_agent.py
    • Imported the new view_image tool.
    • Added view_image to the agent's toolkit dictionary.
  • src/copaw/agents/tools/init.py
    • Imported view_image from its new module.
    • Added view_image to the module's __all__ list.
  • src/copaw/agents/tools/view_image.py
    • Added a new file implementing the view_image asynchronous function.
    • The view_image function validates image paths and MIME types, returning an ImageBlock for valid images or an error TextBlock.
  • src/copaw/app/channels/base.py
    • Imported load_config to access global configuration.
    • Modified the channel's __init__ method to load tool configurations and identify internal tools (those with display_to_user=False).
    • Passed the identified internal_tools to the RenderStyle constructor.
  • src/copaw/app/channels/renderer.py
    • Added an internal_tools frozenset field to the RenderStyle class.
    • Updated _parts_for_tool_output to filter out media parts from tools marked as internal.
    • Updated _parts_for_tool_output to filter out data blocks from tools marked as internal.
  • src/copaw/config/config.py
    • Added a display_to_user boolean field to BuiltinToolConfig.
    • Refactored the default built-in tools into a dedicated _DEFAULT_BUILTIN_TOOLS dictionary.
    • Included view_image in _DEFAULT_BUILTIN_TOOLS with display_to_user set to False by default.
    • Implemented a model_validator in ToolsConfig to automatically merge new code-defined tools into existing user configurations.
Activity
  • No human activity (comments, reviews, etc.) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Generative AI Prohibited Use Policy, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a view_image tool, allowing the LLM to analyze local image files. The implementation is well-structured, including a new display_to_user configuration to control tool output visibility and a model_validator to seamlessly update existing user configurations with the new tool. The changes are thoughtful and improve the multimodal capabilities of the agent.

I've found a potential issue related to mutable default objects in the configuration, which could lead to unexpected side effects. I've left two comments with suggestions to improve the robustness of the configuration handling by ensuring deep copies are used for default tool configurations. Other than that, the changes look great.

description="Get llm token usage",
),
},
default_factory=lambda: dict(_DEFAULT_BUILTIN_TOOLS),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The default_factory creates a shallow copy of _DEFAULT_BUILTIN_TOOLS. This means the BuiltinToolConfig objects are shared between the global default and new ToolsConfig instances. If one of these objects is modified in-place, it will mutate the global default, leading to unexpected side effects for other parts of the application that rely on these defaults. To prevent this, you should create a deep copy of the tool configurations using model_copy().

Suggested change
default_factory=lambda: dict(_DEFAULT_BUILTIN_TOOLS),
default_factory=lambda: {k: v.model_copy() for k, v in _DEFAULT_BUILTIN_TOOLS.items()},

"""Ensure new code-defined tools are present in saved configs."""
for name, default_tc in _DEFAULT_BUILTIN_TOOLS.items():
if name not in self.builtin_tools:
self.builtin_tools[name] = default_tc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the issue with default_factory, assigning default_tc directly shares the BuiltinToolConfig object from the global _DEFAULT_BUILTIN_TOOLS. If this object is later modified (e.g., through a settings UI), it will alter the global default state. You should assign a copy of the object using model_copy() to ensure that each configuration is independent.

Suggested change
self.builtin_tools[name] = default_tc
self.builtin_tools[name] = default_tc.model_copy()

Copy link
Member

@xieyxclack xieyxclack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Leirunlin Leirunlin force-pushed the feat/view-image-tool branch from da11440 to d2d5f0f Compare March 16, 2026 08:23
@Leirunlin Leirunlin temporarily deployed to maintainer-approved March 16, 2026 08:23 — with GitHub Actions Inactive
@xieyxclack xieyxclack merged commit 6b16a80 into agentscope-ai:main Mar 16, 2026
24 checks passed
hh0592821 pushed a commit to hh0592821/CoPaw that referenced this pull request Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: 图片无法识别

2 participants