feat: Gemini TTS #4307

Mustafa-Esoofally · 2025-08-22T16:12:01Z

Summary

Adds support for Gemini TTS.

TODO: OpenAI update

Type of change

Checklist

Code complies with style guidelines
Ran format/validation scripts (./scripts/format.sh and ./scripts/validate.sh)
Self-review completed
Documentation updated (comments, docstrings)
Examples and guides: Relevant cookbook examples have been included or updated (if applicable)
Tested in clean environment
Tests added/updated (if applicable)

This reverts commit eb57d02.

This reverts commit 9d55352.

dirkbrnd · 2025-08-23T12:51:58Z

libs/agno/agno/media.py

@@ -211,22 +211,41 @@ def from_artifact(cls, artifact: AudioArtifact) -> "Audio":

 class AudioResponse(BaseModel):
    id: Optional[str] = None
-    content: Optional[str] = None  # Base64 encoded
+    content: Optional[str] = None  # Base64 encoded (legacy)


Why is this legacy? In other cases we get base64 data

libs/agno/agno/utils/audio.py

dirkbrnd · 2025-08-23T12:53:31Z

libs/agno/agno/utils/audio.py

@@ -20,3 +21,27 @@ def write_audio_to_file(audio, filename: str):
    with open(filename, "wb") as f:
        f.write(wav_bytes)
    log_info(f"Audio file saved to {filename}")
+
+
+def save_wave_file(filename: str, pcm_data: bytes, channels: int = 1, rate: int = 24000, sample_width: int = 2):


Suggested change

def save_wave_file(filename: str, pcm_data: bytes, channels: int = 1, rate: int = 24000, sample_width: int = 2):

def write_wav_audio_to_file(filename: str, pcm_data: bytes, channels: int = 1, rate: int = 24000, sample_width: int = 2):

dirkbrnd · 2025-08-23T12:57:25Z

libs/agno/agno/models/google/gemini.py

+                        # Store raw binary data 
+                        model_response.audio = AudioResponse(
+                            id=str(uuid4()),
+                            raw_content=part.inline_data.data,  # Raw binary data


Lets just re-use content param?
I think generally we should decode audio that we get in base64, so here it is raw (nice, store as is) and then we can update OpenAI that we decode the audio before storing it in AudioResponse, so it is always "raw" when coming from Agno.

But lets do the OpenAI part later. Regardless, lets just use content for both.

makes sense. TODO: OpenAI change

Co-authored-by: Dirk Brand <[email protected]>

dirkbrnd · 2025-09-03T15:58:26Z

libs/agno/agno/media.py

    expires_at: Optional[int] = None
    transcript: Optional[str] = None

    mime_type: Optional[str] = None
    sample_rate: Optional[int] = 24000
    channels: Optional[int] = 1

+    @property
+    def binary_data(self) -> bytes:


This seems unnecesary?

Gemini TTS

eb57d02

Mustafa-Esoofally requested a review from a team as a code owner August 22, 2025 16:12

Mustafa-Esoofally added 3 commits August 22, 2025 21:49

Revert "Gemini TTS"

9d55352

This reverts commit eb57d02.

audio cookbook

236d2a9

Revert "Revert "Gemini TTS""

d41c444

This reverts commit 9d55352.

dirkbrnd reviewed Aug 23, 2025

View reviewed changes

libs/agno/agno/utils/audio.py Outdated Show resolved Hide resolved

dirkbrnd reviewed Aug 23, 2025

View reviewed changes

Mustafa-Esoofally and others added 5 commits September 2, 2025 12:32

Merge branch 'main' into feat/gemini-tts

da4a83b

Update libs/agno/agno/utils/audio.py

e2d0134

Co-authored-by: Dirk Brand <[email protected]>

update

8ba3711

rename to write_wav_audio_to_file

ee8797e

update comment

6563ee4

dirkbrnd reviewed Sep 3, 2025

View reviewed changes

Merge branch 'main' into feat/gemini-tts

64dbdd3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Gemini TTS #4307

feat: Gemini TTS #4307

Uh oh!

Mustafa-Esoofally commented Aug 22, 2025 •

edited

Loading

Uh oh!

dirkbrnd Aug 23, 2025

Uh oh!

Uh oh!

dirkbrnd Aug 23, 2025

Uh oh!

Mustafa-Esoofally Sep 2, 2025

Uh oh!

dirkbrnd Aug 23, 2025

Uh oh!

Mustafa-Esoofally Sep 2, 2025

Uh oh!

dirkbrnd Sep 3, 2025

Uh oh!

Uh oh!

	def save_wave_file(filename: str, pcm_data: bytes, channels: int = 1, rate: int = 24000, sample_width: int = 2):
	def write_wav_audio_to_file(filename: str, pcm_data: bytes, channels: int = 1, rate: int = 24000, sample_width: int = 2):

feat: Gemini TTS #4307

Are you sure you want to change the base?

feat: Gemini TTS #4307

Uh oh!

Conversation

Mustafa-Esoofally commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of change

Checklist

Uh oh!

dirkbrnd Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dirkbrnd Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

Mustafa-Esoofally Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

dirkbrnd Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

Mustafa-Esoofally Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

dirkbrnd Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Mustafa-Esoofally commented Aug 22, 2025 •

edited

Loading