Add benchmarking audio model

To add new audio modality, we do the following steps.
- [ ] Try to construct a testing dataset for Audio Question-Answering. Available data can be used is [here](https://docs.google.com/document/d/1mrTAt4DjwlglDWzMxWfEsdjn9OJTbTo6tA0PrrbyYDM/edit?usp=sharing)
- [ ] Modify API calling interface for Speech-LLM. Suggested API data format is as following
```
messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "audio_url",
                    "audio_url": {
                        "url": "https://st.com/audio.wav",
                    }
                },
            ],
        }
    ],
 ```
 or
 ```
messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "audio_b64_json.",
                    "audio_b64_json.": {
                        "b64_json.": "Base64EncoedAudio",
                    }
                },
            ],
        }
    ],
 ```
 
- [ ] Review code of inference server of speech models and edit it (if needed)
- [ ] Run 1-2 experiments with created datasets and models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarking audio model #39

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add benchmarking audio model #39

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions