Skip to content

feat(tools): add read_media tool for image/video/audio processing#1228

Open
YvanJiang wants to merge 1 commit intoagentscope-ai:mainfrom
YvanJiang:feature/media-reading-tool
Open

feat(tools): add read_media tool for image/video/audio processing#1228
YvanJiang wants to merge 1 commit intoagentscope-ai:mainfrom
YvanJiang:feature/media-reading-tool

Conversation

@YvanJiang
Copy link

Summary

Add a new `read_media` tool for reading and processing image, video, and audio files.

Features

  • Support local paths, file:// URLs, and http(s):// URLs
  • Automatic compression for images (Pillow) and videos (FFmpeg)
  • File format validation using magic numbers
  • Returns appropriate Block types (ImageBlock, VideoBlock, AudioBlock)

Supported Formats

  • Images: PNG, JPG, GIF, WEBP, BMP
  • Videos: MP4, AVI, MOV, MKV, WEBM, FLV, WMV
  • Audio: MP3, WAV, AAC, OGG, M4A, FLAC, WMA

Usage

```python
from copaw.agents.tools import read_media
result = await read_media("/path/to/image.png")
```

Related

Split from PR #1063

🤖 Generated with Claude Code

Add a new async tool that can read and process media files from:
- Local file paths
- file:// URLs
- http(s):// URLs

Features:
- Image support (PNG, JPG, GIF, WEBP, BMP) with compression
- Video support (MP4, AVI, MOV, etc.) with frame extraction
- Audio support (MP3, WAV, AAC, etc.)
- File format validation via magic numbers
- Maximum file size: 20MB before compression
@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@Leirunlin
Copy link
Collaborator

Hi @YvanJiang,

This is a useful feature, but base64 encoding consumes too much context. Additionally, this PR introduces dependencies on Pillow and FFmpeg, adding extra complexity for users.

I’ve submitted an update in #1526 with a more lightweight approach to image reading for multi-modal models. Feel free to share any suggestions there. For audio and video support, new PRs are welcome.

BTW, more discussion is welcome in #1230 if you'd like to follow up there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

first-time-contributor PR created by a first time contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants