-
Notifications
You must be signed in to change notification settings - Fork 1.3k
feat(speechmatics): add max_speakers parameter for speaker diarization #3524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a3a7974
to
c395ae5
Compare
if not is_given(max_speakers) and hasattr(config, "speaker_diarization_config"): | ||
if ( | ||
config.speaker_diarization_config | ||
and "max_speakers" in config.speaker_diarization_config | ||
): | ||
max_speakers = config.speaker_diarization_config["max_speakers"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speaker_diarization_config
is a dataclass? if it requires a specific version of speechmatics
, we can specify it in pyproject.toml.
if not is_given(max_speakers) and hasattr(config, "speaker_diarization_config"): | |
if ( | |
config.speaker_diarization_config | |
and "max_speakers" in config.speaker_diarization_config | |
): | |
max_speakers = config.speaker_diarization_config["max_speakers"] | |
if ( | |
not is_given(max_speakers) | |
and (dz_cfg := config.speaker_diarization_config) | |
and dz_cfg.max_speakers is not None | |
): | |
max_speakers = dz_cfg.max_speakers |
if self._stt_options.diarization_sensitivity is not None: | ||
dz_cfg["speaker_sensitivity"] = self._stt_options.diarization_sensitivity | ||
if self._stt_options.max_speakers is not None: | ||
dz_cfg["max_speakers"] = self._stt_options.max_speakers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also refactor _process_config
to replace the dataclass value of transcription_config
instead of assigning a dict to it.
c395ae5
to
400f516
Compare
- Added max_speakers parameter to STT __init__ method - Updated STTOptions dataclass to include max_speakers field - Modified _process_config to include max_speakers in speaker_diarization_config - Added handling for extracting max_speakers from deprecated transcription_config - Updated documentation to explain the new parameter - Fixed compatibility with livekit-agents 1.2.6 (removed diarization from STTCapabilities) - Updated minimum livekit-agents version to 1.2.6 This parameter allows limiting the number of unique speakers detected during diarization, which is useful for scenarios with a known number of participants (e.g., 2-person interviews, small group meetings with fixed participants).
400f516
to
8400a67
Compare
Summary
This PR adds support for the
max_speakers
parameter to the Speechmatics STT plugin, allowing developers to limit the number of unique speakers detected during diarization.Problem
Currently, when using the Speechmatics STT plugin with diarization enabled, there's no way to specify the maximum number of speakers. The
transcription_config
parameter (which is deprecated) accepts aspeaker_diarization_config
withmax_speakers
, but this value is not preserved when the plugin processes the configuration.Solution
max_speakers
as a direct parameter to the STT__init__
methodSTTOptions
dataclass to include themax_speakers
field_process_config
to includemax_speakers
in thespeaker_diarization_config
when sending to the Speechmatics APImax_speakers
from the deprecatedtranscription_config
parameter for backward compatibilityUse Case
This parameter is particularly useful for scenarios where the number of participants is known in advance, such as:
Testing
transcription_config
parameterExample Usage
Breaking Changes
None - this is a backward-compatible addition.