feat(tools): add view_image tool for LLM visual analysis by Leirunlin · Pull Request #1526 · agentscope-ai/CoPaw

Leirunlin · 2026-03-15T12:21:26Z

Description

Add view_image tool that loads local image files into LLM context for visual analysis. Enables multimodal models to actually "see" local images or images produced by desktop_screenshot, browser_use, or any tool returning file paths.

Four key changes:

view_image tool . A new built-in tool that returns ImageBlock with local path. Validates file existence and image MIME type.
display_to_user per-tool control . New BuiltinToolConfig.display_to_user field + internal_tools mechanism in renderer. Prevents view_image media from leaking to user channels when show_tool_details=False, while still allowing send_file_to_user media through.
Includes model_validator to auto-sync new tools into existing user configs.
promote_tool_result_images. Enabled for OpenAI-compatible formatters so tool result images are promoted to user messages and base64-encoded at format time (matching channel image behavior, memory-efficient).

Related Issue: Fixes #1097, #918, #820

Type of Change

New feature

Component(s) Affected

Core / Backend (app, agents, config, providers, utils, local_models)
Console (frontend web UI)

Checklist

I ran pre-commit run --all-files locally and it passes
If pre-commit auto-fixed files, I committed those changes and reran checks
I ran tests locally (pytest or as relevant) and they pass
Documentation updated (if needed)
Ready for review

Testing

Pass unit test.
To test view_image_tool

Start CoPaw, ask "take a screenshot and describe what you see"
LLM should call desktop_screenshot → view_image → correctly describe screen content
Send an image via Telegram/Dingtalk, confirm LLM still reads it correctly

Additional Notes

view_image uses plain local path (str(resolved)) instead of file:// URI — required because promote_tool_result_images creates new ImageBlocks after the file:// monkey patch runs
Existing users get view_image auto-added to config via model_validator with display_to_user: false
Base64 is forbidden, as too many tokens are introduced to context.
To show image path and images, users have to set display_to_user to True (by default False). An UI design for built-in tool may be beneficial, but not currently supported.

gemini-code-assist · 2026-03-15T12:21:55Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the multimodal capabilities of the system by introducing a new view_image tool, allowing Large Language Models to process and analyze local image files. It also refines the user experience by providing granular control over which tool outputs are displayed to the user, ensuring that internal processing images are not unnecessarily exposed while maintaining the ability to share relevant media. Furthermore, it includes mechanisms for seamless integration of new tools into existing configurations and optimizes image handling for specific LLM formats.

Highlights

New view_image tool: A new built-in tool has been added that loads local image files into the LLM context for visual analysis, returning an ImageBlock with the local path. It includes validation for file existence and image MIME type.
display_to_user per-tool control: A new display_to_user field in BuiltinToolConfig and an internal_tools mechanism in the renderer prevent media from tools like view_image from being displayed to user channels when show_tool_details=False, while still allowing other media (e.g., from send_file_to_user) to pass through.
Auto-sync new tools: A model_validator has been included to automatically synchronize newly defined tools into existing user configurations, ensuring view_image is added with display_to_user: false by default.
promote_tool_result_images for OpenAI-compatible formatters: This feature is now enabled for OpenAI-compatible formatters, promoting tool result images to user messages and base64-encoding them at format time, which matches channel image behavior and is memory-efficient.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

console/src/pages/Settings/Security/components/RuleModal.tsx
- Added "view_image" to the BUILTIN_TOOLS array.
console/src/pages/Settings/Security/index.tsx
- Added "view_image" to the BUILTIN_TOOLS array.
src/copaw/agents/model_factory.py
- Modified _create_formatter_instance to conditionally set promote_tool_result_images=True for OpenAIChatFormatter instances.
src/copaw/agents/react_agent.py
- Imported the new view_image tool.
- Added view_image to the agent's toolkit dictionary.
src/copaw/agents/tools/init.py
- Imported view_image from its new module.
- Added view_image to the module's __all__ list.
src/copaw/agents/tools/view_image.py
- Added a new file implementing the view_image asynchronous function.
- The view_image function validates image paths and MIME types, returning an ImageBlock for valid images or an error TextBlock.
src/copaw/app/channels/base.py
- Imported load_config to access global configuration.
- Modified the channel's __init__ method to load tool configurations and identify internal tools (those with display_to_user=False).
- Passed the identified internal_tools to the RenderStyle constructor.
src/copaw/app/channels/renderer.py
- Added an internal_tools frozenset field to the RenderStyle class.
- Updated _parts_for_tool_output to filter out media parts from tools marked as internal.
- Updated _parts_for_tool_output to filter out data blocks from tools marked as internal.
src/copaw/config/config.py
- Added a display_to_user boolean field to BuiltinToolConfig.
- Refactored the default built-in tools into a dedicated _DEFAULT_BUILTIN_TOOLS dictionary.
- Included view_image in _DEFAULT_BUILTIN_TOOLS with display_to_user set to False by default.
- Implemented a model_validator in ToolsConfig to automatically merge new code-defined tools into existing user configurations.

Activity

No human activity (comments, reviews, etc.) has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Generative AI Prohibited Use Policy, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a view_image tool, allowing the LLM to analyze local image files. The implementation is well-structured, including a new display_to_user configuration to control tool output visibility and a model_validator to seamlessly update existing user configurations with the new tool. The changes are thoughtful and improve the multimodal capabilities of the agent.

I've found a potential issue related to mutable default objects in the configuration, which could lead to unexpected side effects. I've left two comments with suggestions to improve the robustness of the configuration handling by ensuring deep copies are used for default tool configurations. Other than that, the changes look great.

gemini-code-assist · 2026-03-15T12:24:15Z

src/copaw/config/config.py

-                description="Get llm token usage",
-            ),
-        },
+        default_factory=lambda: dict(_DEFAULT_BUILTIN_TOOLS),


The default_factory creates a shallow copy of _DEFAULT_BUILTIN_TOOLS. This means the BuiltinToolConfig objects are shared between the global default and new ToolsConfig instances. If one of these objects is modified in-place, it will mutate the global default, leading to unexpected side effects for other parts of the application that rely on these defaults. To prevent this, you should create a deep copy of the tool configurations using model_copy().

Suggested change

default_factory=lambda: dict(_DEFAULT_BUILTIN_TOOLS),

default_factory=lambda: {k: v.model_copy() for k, v in _DEFAULT_BUILTIN_TOOLS.items()},

gemini-code-assist · 2026-03-15T12:24:15Z

src/copaw/config/config.py

+        """Ensure new code-defined tools are present in saved configs."""
+        for name, default_tc in _DEFAULT_BUILTIN_TOOLS.items():
+            if name not in self.builtin_tools:
+                self.builtin_tools[name] = default_tc


Similar to the issue with default_factory, assigning default_tc directly shares the BuiltinToolConfig object from the global _DEFAULT_BUILTIN_TOOLS. If this object is later modified (e.g., through a settings UI), it will alter the global default state. You should assign a copy of the object using model_copy() to ensure that each configuration is independent.

Suggested change

self.builtin_tools[name] = default_tc

self.builtin_tools[name] = default_tc.model_copy()

xieyxclack

LGTM

…ai#1526)

Leirunlin temporarily deployed to maintainer-approved March 15, 2026 12:21 — with GitHub Actions Inactive

gemini-code-assist bot reviewed Mar 15, 2026

View reviewed changes

Leirunlin temporarily deployed to maintainer-approved March 15, 2026 12:47 — with GitHub Actions Inactive

Leirunlin mentioned this pull request Mar 16, 2026

feat(tools): add read_media tool for image/video/audio processing #1228

Open

xieyxclack approved these changes Mar 16, 2026

View reviewed changes

Leirunlin added 2 commits March 16, 2026 16:21

feat(tools): add view_image tool for LLM visual analysis

986dd57

adopt gemini's advice

d2d5f0f

Leirunlin force-pushed the feat/view-image-tool branch from da11440 to d2d5f0f Compare March 16, 2026 08:23

Leirunlin temporarily deployed to maintainer-approved March 16, 2026 08:23 — with GitHub Actions Inactive

xieyxclack merged commit 6b16a80 into agentscope-ai:main Mar 16, 2026
24 checks passed

hh0592821 pushed a commit to hh0592821/CoPaw that referenced this pull request Mar 19, 2026

feat(tools): add view_image tool for LLM visual analysis (agentscope-…

a22d29d

…ai#1526)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tools): add view_image tool for LLM visual analysis#1526

feat(tools): add view_image tool for LLM visual analysis#1526
xieyxclack merged 2 commits intoagentscope-ai:mainfrom
Leirunlin:feat/view-image-tool

Leirunlin commented Mar 15, 2026

Uh oh!

gemini-code-assist bot commented Mar 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 15, 2026

Uh oh!

gemini-code-assist bot Mar 15, 2026

Uh oh!

xieyxclack left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	default_factory=lambda: dict(_DEFAULT_BUILTIN_TOOLS),
	default_factory=lambda: {k: v.model_copy() for k, v in _DEFAULT_BUILTIN_TOOLS.items()},

	self.builtin_tools[name] = default_tc
	self.builtin_tools[name] = default_tc.model_copy()

Conversation

Leirunlin commented Mar 15, 2026

Description

Type of Change

Component(s) Affected

Checklist

Testing

Additional Notes

Uh oh!

gemini-code-assist bot commented Mar 15, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

xieyxclack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants