Skip to content

Add benchmarking audio model #39

@sangttruong

Description

@sangttruong

To add new audio modality, we do the following steps.

  • Try to construct a testing dataset for Audio Question-Answering. Available data can be used is here
  • Modify API calling interface for Speech-LLM. Suggested API data format is as following
messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "audio_url",
                    "audio_url": {
                        "url": "https://st.com/audio.wav",
                    }
                },
            ],
        }
    ],

or

messages=[
       {
           "role": "user",
           "content": [
               {"type": "text", "text": "What's in this image?"},
               {
                   "type": "audio_b64_json.",
                   "audio_b64_json.": {
                       "b64_json.": "Base64EncoedAudio",
                   }
               },
           ],
       }
   ],
  • Review code of inference server of speech models and edit it (if needed)
  • Run 1-2 experiments with created datasets and models

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions