Skip to content

Conversation

nsepehr
Copy link

@nsepehr nsepehr commented Sep 29, 2025

Summary

This PR adds support for the max_speakers parameter to the Speechmatics STT plugin, allowing developers to limit the number of unique speakers detected during diarization.

Problem

Currently, when using the Speechmatics STT plugin with diarization enabled, there's no way to specify the maximum number of speakers. The transcription_config parameter (which is deprecated) accepts a speaker_diarization_config with max_speakers, but this value is not preserved when the plugin processes the configuration.

Solution

  • Added max_speakers as a direct parameter to the STT __init__ method
  • Updated the STTOptions dataclass to include the max_speakers field
  • Modified _process_config to include max_speakers in the speaker_diarization_config when sending to the Speechmatics API
  • Added proper handling for extracting max_speakers from the deprecated transcription_config parameter for backward compatibility
  • Updated documentation to explain the new parameter

Use Case

This parameter is particularly useful for scenarios where the number of participants is known in advance, such as:

  • Two-person interviews or conversations
  • Small group discussions with a fixed number of participants
  • Customer service calls (agent and customer)
  • Educational settings with known speaker counts

Testing

  • Tested locally with a multi-speaker agent implementation
  • Verified that the parameter is correctly passed to the Speechmatics API configuration
  • Confirmed backward compatibility with the deprecated transcription_config parameter

Example Usage

stt = speechmatics.STT(
    language="en",
    enable_diarization=True,
    max_speakers=2,  # Limit to 2 speakers
    diarization_sensitivity=0.5,
    speaker_active_format="@[{speaker_id}]: {text}",
)

Breaking Changes

None - this is a backward-compatible addition.

@CLAassistant
Copy link

CLAassistant commented Sep 29, 2025

CLA assistant check
All committers have signed the CLA.

@nsepehr nsepehr force-pushed the feat/speechmatics-max-speakers branch 2 times, most recently from a3a7974 to c395ae5 Compare September 29, 2025 06:18
Comment on lines +270 to +275
if not is_given(max_speakers) and hasattr(config, "speaker_diarization_config"):
if (
config.speaker_diarization_config
and "max_speakers" in config.speaker_diarization_config
):
max_speakers = config.speaker_diarization_config["max_speakers"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speaker_diarization_config is a dataclass? if it requires a specific version of speechmatics, we can specify it in pyproject.toml.

Suggested change
if not is_given(max_speakers) and hasattr(config, "speaker_diarization_config"):
if (
config.speaker_diarization_config
and "max_speakers" in config.speaker_diarization_config
):
max_speakers = config.speaker_diarization_config["max_speakers"]
if (
not is_given(max_speakers)
and (dz_cfg := config.speaker_diarization_config)
and dz_cfg.max_speakers is not None
):
max_speakers = dz_cfg.max_speakers

if self._stt_options.diarization_sensitivity is not None:
dz_cfg["speaker_sensitivity"] = self._stt_options.diarization_sensitivity
if self._stt_options.max_speakers is not None:
dz_cfg["max_speakers"] = self._stt_options.max_speakers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also refactor _process_config to replace the dataclass value of transcription_config instead of assigning a dict to it.

@nsepehr nsepehr force-pushed the feat/speechmatics-max-speakers branch from c395ae5 to 400f516 Compare September 30, 2025 05:25
- Added max_speakers parameter to STT __init__ method
- Updated STTOptions dataclass to include max_speakers field
- Modified _process_config to include max_speakers in speaker_diarization_config
- Added handling for extracting max_speakers from deprecated transcription_config
- Updated documentation to explain the new parameter
- Fixed compatibility with livekit-agents 1.2.6 (removed diarization from STTCapabilities)
- Updated minimum livekit-agents version to 1.2.6

This parameter allows limiting the number of unique speakers detected during
diarization, which is useful for scenarios with a known number of participants
(e.g., 2-person interviews, small group meetings with fixed participants).
@nsepehr nsepehr force-pushed the feat/speechmatics-max-speakers branch from 400f516 to 8400a67 Compare September 30, 2025 05:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants