Skip to content

Conversation

Mustafa-Esoofally
Copy link
Contributor

@Mustafa-Esoofally Mustafa-Esoofally commented Aug 22, 2025

Summary

Adds support for Gemini TTS.

TODO: OpenAI update

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Improvement
  • Model update
  • Other:

Checklist

  • Code complies with style guidelines
  • Ran format/validation scripts (./scripts/format.sh and ./scripts/validate.sh)
  • Self-review completed
  • Documentation updated (comments, docstrings)
  • Examples and guides: Relevant cookbook examples have been included or updated (if applicable)
  • Tested in clean environment
  • Tests added/updated (if applicable)

@Mustafa-Esoofally Mustafa-Esoofally requested a review from a team as a code owner August 22, 2025 16:12
@@ -211,22 +211,41 @@ def from_artifact(cls, artifact: AudioArtifact) -> "Audio":

class AudioResponse(BaseModel):
id: Optional[str] = None
content: Optional[str] = None # Base64 encoded
content: Optional[str] = None # Base64 encoded (legacy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this legacy? In other cases we get base64 data

@@ -20,3 +21,27 @@ def write_audio_to_file(audio, filename: str):
with open(filename, "wb") as f:
f.write(wav_bytes)
log_info(f"Audio file saved to {filename}")


def save_wave_file(filename: str, pcm_data: bytes, channels: int = 1, rate: int = 24000, sample_width: int = 2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def save_wave_file(filename: str, pcm_data: bytes, channels: int = 1, rate: int = 24000, sample_width: int = 2):
def write_wav_audio_to_file(filename: str, pcm_data: bytes, channels: int = 1, rate: int = 24000, sample_width: int = 2):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

# Store raw binary data
model_response.audio = AudioResponse(
id=str(uuid4()),
raw_content=part.inline_data.data, # Raw binary data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets just re-use content param?
I think generally we should decode audio that we get in base64, so here it is raw (nice, store as is) and then we can update OpenAI that we decode the audio before storing it in AudioResponse, so it is always "raw" when coming from Agno.

But lets do the OpenAI part later. Regardless, lets just use content for both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. TODO: OpenAI change

expires_at: Optional[int] = None
transcript: Optional[str] = None

mime_type: Optional[str] = None
sample_rate: Optional[int] = 24000
channels: Optional[int] = 1

@property
def binary_data(self) -> bytes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecesary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants