Motivation
now only multi-modal agent has the toolkit to read multi-modal information, but now many LLMs support vision, it's better to natively support worker agent get image path from decomposed sub tasks
Solution
No response
Alternatives
No response
Additional context
No response