Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dialogflow-cx): Example implementation of Streaming Detect Intent with continuous microphone input and audio output #13053

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

ffeldhaus
Copy link

Description

Fixes #13041

Checklist

README.rst is not updated as it seems to be auto generated.

@ffeldhaus ffeldhaus requested review from a team as code owners January 14, 2025 11:21
Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will provide my feedback in a few minutes. In the meantime, here's a summary of the changes for you and other reviewers to quickly get up to speed:

This pull request implements a real-time bidirectional streaming audio interface with Google Cloud Dialogflow CX. The primary goal is to create an example that captures audio from a user's microphone, streams it to Dialogflow CX for intent detection, and plays back the synthesized audio responses.

Here's a breakdown of the changes:

  • Intent: Provide a complete example implementation of Streaming Detect Intent with continuous microphone input and audio output. This addresses issue Streaming Detect Intent with continuous microphone input #13041.
  • Changes:
    • Added a new Python file: dialogflow-cx/streaming_detect_intent_infinite.py. This file contains the core logic for the bidirectional audio streaming with Dialogflow CX. It's a substantial addition (lines 1-727).
    • Updated dialogflow-cx/requirements.txt (lines 1-7) to include necessary dependencies: termcolor==2.5.0 and pyaudio==0.2.14. The google-cloud-dialogflow-cx library was also updated to version 1.38.0.
  • Activity: The author, ffeldhaus, has indicated that tests and lint checks have passed.

The example is well-documented and includes instructions on setting up the environment and running the script. It also handles potential errors and provides informative logging.

I'll be checking for things like error handling, code style, and overall clarity of the example. I'll post my detailed review shortly.


And now, a little haiku to lighten the mood:

Code flows like a stream,
Dialog's voice, a clear response,
Smooth talk, all in sync.

@product-auto-label product-auto-label bot added the samples Issues that are directly related to samples. label Jan 14, 2025
Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request introduces a continuous streaming example for Dialogflow CX, addressing issue #13041. The implementation is comprehensive and well-structured, demonstrating bidirectional streaming, audio input/output, and graceful exit handling. The code is generally clean and follows best practices. However, there are a few minor suggestions for improvement.

dialogflow-cx/streaming_detect_intent_infinite.py Outdated Show resolved Hide resolved
dialogflow-cx/streaming_detect_intent_infinite.py Outdated Show resolved Hide resolved
dialogflow-cx/streaming_detect_intent_infinite.py Outdated Show resolved Hide resolved
ffeldhaus and others added 3 commits January 14, 2025 12:29
Update example invocation

Co-authored-by: code-review-assist[bot] <182814678+code-review-assist[bot]@users.noreply.github.com>
Add validation of agent name

Co-authored-by: code-review-assist[bot] <182814678+code-review-assist[bot]@users.noreply.github.com>
@ffeldhaus ffeldhaus changed the title Example implementation of Streaming Detect Intent with continuous microphone input and audio output feat(dialogflow-cx): Example implementation of Streaming Detect Intent with continuous microphone input and audio output Jan 14, 2025
@glasnt
Copy link
Contributor

glasnt commented Jan 16, 2025

Internal b/389902797

@glasnt glasnt added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jan 16, 2025
Copy link

snippet-bot bot commented Jan 17, 2025

Here is the summary of changes.

You are about to add 1 region tag.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

Copy link
Contributor

@davidcavazos davidcavazos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR also needs tests. Since it's an "infinite" streaming CLI app, there's no need to test it end-to-end, but unit tests around the most important functions would be nice.

It's a little tricky since most functions are methods depending on the internal state of an object, so each test might need to create its own object.

I'm okay with more minimal testing around AudioIO since that's what streams from the microphone and can get tricky. However the Dialogflow code should be tested, it seems it could simply use a bytes generator with some fixed data, or maybe it could be read from a file?

import struct
import sys
import time
from typing import Any, AsyncGenerator, List, Optional, Tuple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per PEP-585 and PEP-604, the recommended type hints are:

List[A] -> list[A] (no need to import)
Optional[A] -> A | None (no need to import)
Tuple[A, B, C] -> tuple[A, B, C] (no need to import)
typing.AsyncGenerator -> collections.abc.AsyncGenerator

These are supported on Python 3.9 and 3.10 respectively. 3.9 is already the oldest supported version. If needed, you can add a from __future__ import annotations.

audio_encoding: str = "LINEAR16",
output_audio_encoding: str = "LINEAR16",
) -> None:
self._rate: int = rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of a code sample is to provide a simple example on how to use the APIs or show a common pattern, not to provide a fully customizable tool. The amount of options here makes the sample look a lot more intimidating than what it actually is.

Can we inline as many of these options into where they're used? That makes the code sample easier to read, since you can directly see all the values rather than having to scroll up to see which configurations are important and which can be left to None or some other empty value.

Let's just leave here the variables needed to keep state on the object. Any configuration should be inlined.

dialogflow_timeout,
debug,
sample_rate,
"LINEAR16",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these LINEAR16 are hardcoded and not configurable, can we remove them from DialogCXStreaming altogether and inline them directly where they're used?

language_code: str,
model: str,
dialogflow_timeout: float,
debug: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, can we only keep the required arguments? Please inline any "constant" configuration to reduce complexity.

self.client: dialogflowcx_v3.SessionsAsyncClient = (
dialogflowcx_v3.SessionsAsyncClient(client_options=client_options)
)
self.agent_name: str = agent_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, please only keep whatever is necessary to keep state. Configurations should be inlined.

@glasnt glasnt assigned davidcavazos and unassigned glasnt Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge Indicates a pull request not ready for merge, due to either quality or timing. samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streaming Detect Intent with continuous microphone input
3 participants