-
Notifications
You must be signed in to change notification settings - Fork 4.2k
feat: Gemini TTS #4307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Gemini TTS #4307
Conversation
libs/agno/agno/media.py
Outdated
@@ -211,22 +211,41 @@ def from_artifact(cls, artifact: AudioArtifact) -> "Audio": | |||
|
|||
class AudioResponse(BaseModel): | |||
id: Optional[str] = None | |||
content: Optional[str] = None # Base64 encoded | |||
content: Optional[str] = None # Base64 encoded (legacy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this legacy? In other cases we get base64 data
libs/agno/agno/utils/audio.py
Outdated
@@ -20,3 +21,27 @@ def write_audio_to_file(audio, filename: str): | |||
with open(filename, "wb") as f: | |||
f.write(wav_bytes) | |||
log_info(f"Audio file saved to {filename}") | |||
|
|||
|
|||
def save_wave_file(filename: str, pcm_data: bytes, channels: int = 1, rate: int = 24000, sample_width: int = 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def save_wave_file(filename: str, pcm_data: bytes, channels: int = 1, rate: int = 24000, sample_width: int = 2): | |
def write_wav_audio_to_file(filename: str, pcm_data: bytes, channels: int = 1, rate: int = 24000, sample_width: int = 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
# Store raw binary data | ||
model_response.audio = AudioResponse( | ||
id=str(uuid4()), | ||
raw_content=part.inline_data.data, # Raw binary data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets just re-use content
param?
I think generally we should decode audio that we get in base64, so here it is raw (nice, store as is) and then we can update OpenAI that we decode the audio before storing it in AudioResponse
, so it is always "raw" when coming from Agno.
But lets do the OpenAI part later. Regardless, lets just use content
for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. TODO: OpenAI change
expires_at: Optional[int] = None | ||
transcript: Optional[str] = None | ||
|
||
mime_type: Optional[str] = None | ||
sample_rate: Optional[int] = 24000 | ||
channels: Optional[int] = 1 | ||
|
||
@property | ||
def binary_data(self) -> bytes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unnecesary?
Summary
Adds support for Gemini TTS.
TODO: OpenAI update
Type of change
Checklist
./scripts/format.sh
and./scripts/validate.sh
)