-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(dialogflow-cx): Example implementation of Streaming Detect Intent with continuous microphone input and audio output #13053
base: main
Are you sure you want to change the base?
Conversation
…rophone input and speaker output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi team,
I'm currently reviewing this pull request and will provide my feedback in a few minutes. In the meantime, here's a summary of the changes for you and other reviewers to quickly get up to speed:
This pull request implements a real-time bidirectional streaming audio interface with Google Cloud Dialogflow CX. The primary goal is to create an example that captures audio from a user's microphone, streams it to Dialogflow CX for intent detection, and plays back the synthesized audio responses.
Here's a breakdown of the changes:
- Intent: Provide a complete example implementation of Streaming Detect Intent with continuous microphone input and audio output. This addresses issue Streaming Detect Intent with continuous microphone input #13041.
- Changes:
- Added a new Python file:
dialogflow-cx/streaming_detect_intent_infinite.py
. This file contains the core logic for the bidirectional audio streaming with Dialogflow CX. It's a substantial addition (lines 1-727). - Updated
dialogflow-cx/requirements.txt
(lines 1-7) to include necessary dependencies:termcolor==2.5.0
andpyaudio==0.2.14
. Thegoogle-cloud-dialogflow-cx
library was also updated to version 1.38.0.
- Added a new Python file:
- Activity: The author, ffeldhaus, has indicated that tests and lint checks have passed.
The example is well-documented and includes instructions on setting up the environment and running the script. It also handles potential errors and provides informative logging.
I'll be checking for things like error handling, code style, and overall clarity of the example. I'll post my detailed review shortly.
And now, a little haiku to lighten the mood:
Code flows like a stream,
Dialog's voice, a clear response,
Smooth talk, all in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request introduces a continuous streaming example for Dialogflow CX, addressing issue #13041. The implementation is comprehensive and well-structured, demonstrating bidirectional streaming, audio input/output, and graceful exit handling. The code is generally clean and follows best practices. However, there are a few minor suggestions for improvement.
Update example invocation Co-authored-by: code-review-assist[bot] <182814678+code-review-assist[bot]@users.noreply.github.com>
Add validation of agent name Co-authored-by: code-review-assist[bot] <182814678+code-review-assist[bot]@users.noreply.github.com>
Internal b/389902797 |
Here is the summary of changes. You are about to add 1 region tag.
This comment is generated by snippet-bot.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR also needs tests. Since it's an "infinite" streaming CLI app, there's no need to test it end-to-end, but unit tests around the most important functions would be nice.
It's a little tricky since most functions are methods depending on the internal state of an object, so each test might need to create its own object.
I'm okay with more minimal testing around AudioIO
since that's what streams from the microphone and can get tricky. However the Dialogflow code should be tested, it seems it could simply use a bytes generator with some fixed data, or maybe it could be read from a file?
import struct | ||
import sys | ||
import time | ||
from typing import Any, AsyncGenerator, List, Optional, Tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per PEP-585 and PEP-604, the recommended type hints are:
List[A]
-> list[A]
(no need to import)
Optional[A]
-> A | None
(no need to import)
Tuple[A, B, C] -> tuple[A, B, C]
(no need to import)
typing.AsyncGenerator -> collections.abc.AsyncGenerator
These are supported on Python 3.9 and 3.10 respectively. 3.9 is already the oldest supported version. If needed, you can add a from __future__ import annotations
.
audio_encoding: str = "LINEAR16", | ||
output_audio_encoding: str = "LINEAR16", | ||
) -> None: | ||
self._rate: int = rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of a code sample is to provide a simple example on how to use the APIs or show a common pattern, not to provide a fully customizable tool. The amount of options here makes the sample look a lot more intimidating than what it actually is.
Can we inline as many of these options into where they're used? That makes the code sample easier to read, since you can directly see all the values rather than having to scroll up to see which configurations are important and which can be left to None or some other empty value.
Let's just leave here the variables needed to keep state on the object. Any configuration should be inlined.
dialogflow_timeout, | ||
debug, | ||
sample_rate, | ||
"LINEAR16", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these LINEAR16 are hardcoded and not configurable, can we remove them from DialogCXStreaming
altogether and inline them directly where they're used?
language_code: str, | ||
model: str, | ||
dialogflow_timeout: float, | ||
debug: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, can we only keep the required arguments? Please inline any "constant" configuration to reduce complexity.
self.client: dialogflowcx_v3.SessionsAsyncClient = ( | ||
dialogflowcx_v3.SessionsAsyncClient(client_options=client_options) | ||
) | ||
self.agent_name: str = agent_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, please only keep whatever is necessary to keep state. Configurations should be inlined.
Description
Fixes #13041
Checklist
nox -s py-3.9
(see Test Environment Setup)nox -s lint
(see Test Environment Setup)README.rst is not updated as it seems to be auto generated.