I think CactusAgent should support multimodal

Use Case:
Input an image, prompt: Please analyze this image.
Tool:
```
{
                                "type": "function",
                                "function": {
                                    "name": "analyze_image_output",
                                    "description": "analyze_image_output",
                                    "parameters": {
                                        "type": "object",
                                        "properties": {
                                            "desc": {
                                                "type": "string",
                                                "description": "A specific description of the image content, including the people, objects, scenes, colors, actions, and other details that you see in the image"
                                            },
                                            "category": {
                                                "type": "string",
                                                "description": "For example: characters, cartoons",
                                                "enum": categoriesName
                                            }
                                        },
                                        "required": ["desc", "category"]
                                    }
                                }
                            }
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

I think CactusAgent should support multimodal #128

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

I think CactusAgent should support multimodal #128

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions