-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
To add new audio modality, we do the following steps.
- Try to construct a testing dataset for Audio Question-Answering. Available data can be used is here
- Modify API calling interface for Speech-LLM. Suggested API data format is as following
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "audio_url",
"audio_url": {
"url": "https://st.com/audio.wav",
}
},
],
}
],
or
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "audio_b64_json.",
"audio_b64_json.": {
"b64_json.": "Base64EncoedAudio",
}
},
],
}
],
- Review code of inference server of speech models and edit it (if needed)
- Run 1-2 experiments with created datasets and models
Metadata
Metadata
Assignees
Labels
No labels