Skip to content

Commit 963b6cf

Browse files
authored
feat(pyannoteAI): add wrapper around pyannoteAI SDK
1 parent ce49576 commit 963b6cf

File tree

7 files changed

+263
-67
lines changed

7 files changed

+263
-67
lines changed

CHANGELOG.md

+12-3
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,25 @@
44

55
### TL;DR
66

7-
#### Quality of life improvements
7+
#### Quality-of-Life improvements
88

99
Models can now be stored alongside their pipelines in the same repository, streamlining gating mechanism:
1010
- accept `pyannote/speaker-diarization-x.x` pipeline user agreement
1111
- ~~accept `pyannote/segmentation-3.0` model user agreement~~
1212
- ~~accept `pyannote/wespeaker-voxceleb-resnet34-LM` model user agreement~~
1313
- load pipeline with `Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", token=True)`
1414

15-
#### Improve speech separation quality
15+
#### [pyannoteAI](https://www.pyannote.ai) premium speaker diarization
1616

17-
Clipping and speaker/source alignment issues in speech separation pipeline have been fixed.
17+
Change one line of code to use [pyannoteAI](https://docs.pyannote.ai) and enjoy **more accurate speaker diarization**.
18+
19+
```diff
20+
from pyannote.audio import Pipeline
21+
pipeline = Pipeline.from_pretrained(
22+
- "pyannote/speaker-diarization-3.1", token="huggingface-access-token")
23+
+ "pyannoteAI/speaker-diarization-precision, token="pyannoteAI-api-key")
24+
diarization = pipeline("/path/to/conversation.wav")
25+
```
1826

1927
### Breaking changes
2028

@@ -31,6 +39,7 @@ Clipping and speaker/source alignment issues in speech separation pipeline have
3139

3240
### New features
3341

42+
- feat(pyannoteAI): add wrapper around pyannoteAI SDK
3443
- improve(hub): add support for pipeline repos that also include underlying models
3544
- feat(clustering): add support for `k-means` clustering
3645
- feat(model): add `wav2vec_frozen` option to freeze/unfreeze `wav2vec` in `SSeRiouSS` architecture

README.md

+63-34
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,46 @@
11
Using `pyannote.audio` open-source toolkit in production?
22
Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.
33

4-
# `pyannote.audio` speaker diarization toolkit
4+
# `pyannote` speaker diarization toolkit
55

66
`pyannote.audio` is an open-source toolkit written in Python for speaker diarization. Based on [PyTorch](https://pytorch.org) machine learning framework, it comes with state-of-the-art [pretrained models and pipelines](https://hf.co/pyannote), that can be further finetuned to your own data for even better performance.
77

88
<p align="center">
99
<a href="https://www.youtube.com/watch?v=37R_R82lfwA"><img src="https://img.youtube.com/vi/37R_R82lfwA/0.jpg"></a>
1010
</p>
1111

12-
## TL;DR
12+
13+
## Highlights
14+
15+
- :exploding_head: state-of-the-art performance (see [Benchmark](#benchmark))
16+
- :hugs: pretrained [pipelines](https://hf.co/models?other=pyannote-audio-pipeline) (and [models](https://hf.co/models?other=pyannote-audio-model)) on [:hugs: model hub](https://huggingface.co/pyannote)
17+
- :rocket: built-in support for [pyannoteAI](https://pyannote.ai) premium speaker diarization
18+
- :snake: Python-first API
19+
- :zap: multi-GPU training with [pytorch-lightning](https://pytorchlightning.ai/)
20+
21+
## Open-source speaker diarization pipeline
1322

1423
1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) with `pip install pyannote.audio`
1524
2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions
1625
3. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions
17-
4. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
26+
4. Create Huggingface access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
1827

1928
```python
29+
import torch
2030
from pyannote.audio import Pipeline
2131
from pyannote.audio.pipelines.utils.hook import ProgressHook
2232

33+
# Open-source pyannote speaker diarization pipeline
2334
pipeline = Pipeline.from_pretrained(
2435
"pyannote/speaker-diarization-3.1",
25-
token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")
36+
token="HUGGINGFACE_ACCESS_TOKEN")
2637

2738
# send pipeline to GPU (when available)
28-
import torch
2939
pipeline.to(torch.device("cuda"))
3040

3141
# apply pretrained pipeline (with optional progress hook)
3242
with ProgressHook() as hook:
33-
diarization = pipeline("audio.wav", hook=hook)
43+
diarization = pipeline("audio.wav", hook=hook) # runs locally
3444

3545
# print the result
3646
for turn, _, speaker in diarization.itertracks(yield_label=True):
@@ -39,14 +49,56 @@ for turn, _, speaker in diarization.itertracks(yield_label=True):
3949
# start=1.8s stop=3.9s speaker_1
4050
# start=4.2s stop=5.7s speaker_0
4151
# ...
52+
4253
```
4354

44-
## Highlights
55+
## Premium pyannoteAI speaker diarization pipeline
56+
57+
1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) with `pip install pyannote.audio`
58+
2. Create pyannoteAI API key at [`dashboard.pyannote.ai`](https://dashboard.pyannote.ai)
59+
60+
```python
61+
from pyannote.audio import Pipeline
62+
63+
# Premium pyannoteAI speaker diarization service
64+
pipeline = Pipeline.from_pretrained(
65+
"pyannoteAI/speaker-diarization-precision", token="PYANNOTEAI_API_KEY")
66+
67+
diarization = pipeline("audio.wav") # runs on pyannoteAI servers
68+
69+
# print the result
70+
for turn, _, speaker in diarization.itertracks(yield_label=True):
71+
print(f"start={turn.start:.1f}s stop={turn.end:.1f}s {speaker}")
72+
# start=0.2s stop=1.6s SPEAKER_00
73+
# start=1.8s stop=4.0s SPEAKER_01
74+
# start=4.2s stop=5.6s SPEAKER_00
75+
# ...
76+
```
77+
78+
Visit [`docs.pyannote.ai`](https://docs.pyannote.ai) to learn about other pyannoteAI features (voiceprinting, confidence scores, ...)
79+
80+
## Benchmark
81+
82+
Out of the box, `pyannote.audio` speaker diarization [pipeline v3.1](https://hf.co/pyannote/speaker-diarization-3.1) is expected to be much better (and faster) than v2.x. [`pyannoteAI`](https://www.pyannote.ai) premium model goes one step further. Those numbers are diarization error rates (in %) - the lower the better.
83+
84+
| Benchmark (2025-03) | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /></a> |
85+
| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------------ |
86+
| [AISHELL-4](https://arxiv.org/abs/2104.03603) | 14.1 | 12.2 | 12.1 |
87+
| [AliMeeting](https://www.openslr.org/119/) (channel 1) | 27.4 | 24.5 | 19.8 |
88+
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM) | 18.9 | 18.8 | 15.8 |
89+
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM) | 27.1 | 22.7 | 18.3 |
90+
| [AVA-AVD](https://arxiv.org/abs/2111.14448) | 66.3 | 49.7 | 45.3 |
91+
| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 31.6 | 28.4 | 20.1 |
92+
| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) | 26.9 | 21.4 | 17.2 |
93+
| [Earnings21](https://github.com/revdotcom/speech-datasets) | 17.0 | 9.4 | 9.0 |
94+
| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.) | 61.5 | 51.2 | 45.8 |
95+
| [MSDWild](https://github.com/X-LANCE/MSDWILD) | 32.8 | 25.4 | 19.7 |
96+
| [RAMC](https://www.openslr.org/123/) | 22.5 | 22.2 | 11.1 |
97+
| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 8.2 | 7.9 | 7.6 |
98+
| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 11.2 | 9.9 |
99+
100+
[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)
45101

46-
- :hugs: pretrained [pipelines](https://hf.co/models?other=pyannote-audio-pipeline) (and [models](https://hf.co/models?other=pyannote-audio-model)) on [:hugs: model hub](https://huggingface.co/pyannote)
47-
- :exploding_head: state-of-the-art performance (see [Benchmark](#benchmark))
48-
- :snake: Python-first API
49-
- :zap: multi-GPU training with [pytorch-lightning](https://pytorchlightning.ai/)
50102

51103
## Documentation
52104

@@ -78,29 +130,6 @@ for turn, _, speaker in diarization.itertracks(yield_label=True):
78130
- 2024-04-05 > [Offline speaker diarization (speaker-diarization-3.1)](tutorials/community/offline_usage_speaker_diarization.ipynb) by [Simon Ottenhaus](https://github.com/simonottenhauskenbun)
79131
- 2024-09-24 > [Evaluating `pyannote` pretrained speech separation pipelines](tutorials/community/eval_separation_pipeline.ipynb) by [Clément Pagés](https://github.com/)
80132

81-
## Benchmark
82-
83-
Out of the box, `pyannote.audio` speaker diarization [pipeline](https://hf.co/pyannote/speaker-diarization-3.1) v3.1 is expected to be much better (and faster) than v2.x.
84-
Those numbers are diarization error rates (in %):
85-
86-
| Benchmark | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [pyannoteAI](https://www.pyannote.ai) |
87-
| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------------ |
88-
| [AISHELL-4](https://arxiv.org/abs/2104.03603) | 14.1 | 12.2 | 11.9 |
89-
| [AliMeeting](https://www.openslr.org/119/) (channel 1) | 27.4 | 24.4 | 22.5 |
90-
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM) | 18.9 | 18.8 | 16.6 |
91-
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM) | 27.1 | 22.4 | 20.9 |
92-
| [AVA-AVD](https://arxiv.org/abs/2111.14448) | 66.3 | 50.0 | 39.8 |
93-
| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 31.6 | 28.4 | 22.2 |
94-
| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) | 26.9 | 21.7 | 17.2 |
95-
| [Earnings21](https://github.com/revdotcom/speech-datasets) | 17.0 | 9.4 | 9.0 |
96-
| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.) | 61.5 | 51.2 | 43.8 |
97-
| [MSDWild](https://github.com/X-LANCE/MSDWILD) | 32.8 | 25.3 | 19.8 |
98-
| [RAMC](https://www.openslr.org/123/) | 22.5 | 22.2 | 18.4 |
99-
| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 8.2 | 7.8 | 7.6 |
100-
| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 11.3 | 9.4 |
101-
102-
[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)
103-
104133
## Citations
105134

106135
If you use `pyannote.audio` please use the following citations:

pyproject.toml

+1
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ dependencies = [
2929
"torchmetrics>=1.6.1",
3030
"soundfile>=0.13.1",
3131
"matplotlib>=3.10.0",
32+
"pyannoteai.sdk>=0.1.0",
3233
]
3334

3435
[project.scripts]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# The MIT License (MIT)
2+
#
3+
# Copyright (c) 2025- pyannoteAI
4+
#
5+
# Permission is hereby granted, free of charge, to any person obtaining a copy
6+
# of this software and associated documentation files (the "Software"), to deal
7+
# in the Software without restriction, including without limitation the rights
8+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
# copies of the Software, and to permit persons to whom the Software is
10+
# furnished to do so, subject to the following conditions:
11+
12+
# The above copyright notice and this permission notice shall be included in
13+
# all copies or substantial portions of the Software.
14+
15+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
# SOFTWARE.
22+
23+
from .speaker_diarization import PremiumSpeakerDiarization
24+
25+
__all__ = ["PremiumSpeakerDiarization"]

0 commit comments

Comments
 (0)