Neural speaker diarization with `pyannote.audio`

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines.

TL;DR

# 1. visit hf.co/pyannote/speaker-diarization and hf.co/pyannote/segmentation and accept user conditions (only if requested)
# 2. visit hf.co/settings/tokens to create an access token (only if you had to go through 1.)
# 3. instantiate pretrained speaker diarization pipeline
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization",
                                    use_auth_token="ACCESS_TOKEN_GOES_HERE")

# 4. apply pretrained pipeline
diarization = pipeline("audio.wav")

# 5. print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...

Highlights

🤗 pretrained pipelines (and models) on 🤗 model hub
🤯 state-of-the-art performance (see Benchmark)
🐍 Python-first API
⚡ multi-GPU training with pytorch-lightning
🎛️ data augmentation with torch-audiomentations

Installation

Only Python 3.8+ is supported.

# install from develop branch
pip install -qq https://github.com/pyannote/pyannote-audio/archive/refs/heads/develop.zip

Documentation

Changelog
Frequently asked questions
Models
- Available tasks explained
- Applying a pretrained model
- Training, fine-tuning, and transfer learning
Pipelines
- Available pipelines explained
- Applying a pretrained pipeline
- Adapting a pretrained pipeline to your own data
- Training a pipeline
Contributing
- Adding a new model
- Adding a new task
- Adding a new pipeline
- Sharing pretrained models and pipelines
Blog
- 2022-12-02 > "How I reached 1st place at Ego4D 2022, 1st place at Albayzin 2022, and 6th place at VoxSRC 2022 speaker diarization challenges"
- 2022-10-23 > "One speaker segmentation model to rule them all"
- 2021-08-05 > "Streaming voice activity detection with pyannote.audio"
Miscellaneous
- Training with pyannote-audio-train command line tool
- Annotating your own data with Prodigy
- Speaker verification
- Visualization and debugging

Benchmark

Out of the box, pyannote.audio default speaker diarization pipeline is expected to be much better (and faster) in v2.x than in v1.1. Those numbers are diarization error rates (in %)

Dataset \ Version	v1.1	v2.0	v2.1.1 (finetuned)
AISHELL-4	-	14.6	14.1 (14.5)
AliMeeting (channel 1)	-	-	27.4 (23.8)
AMI (IHM)	29.7	18.2	18.9 (18.5)
AMI (SDM)	-	29.0	27.1 (22.2)
CALLHOME (part2)	-	30.2	32.4 (29.3)
DIHARD 3 (full)	29.2	21.0	26.9 (21.9)
VoxConverse (v0.3)	21.5	12.6	11.2 (10.7)
REPERE (phase2)	-	12.6	8.2 ( 8.3)
This American Life	-	-	20.8 (15.2)

Citations

If you use pyannote.audio please use the following citations:

@inproceedings{Bredin2020,
  Title = {{pyannote.audio: neural building blocks for speaker diarization}},
  Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
  Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
  Year = {2020},
}

@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Year = {2021},
}

Support

For commercial enquiries and scientific consulting, please contact me.

Development

The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio library.

pip install -e .[dev,testing]
pre-commit install

Test

pytest

Name		Name	Last commit message	Last commit date
Latest commit History 2,264 Commits
.faq		.faq
.github		.github
doc		doc
notebook		notebook
pyannote		pyannote
questions		questions
tests		tests
tutorials		tutorials
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
FAQ.md		FAQ.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
codecov.yml		codecov.yml
environment.yaml		environment.yaml
faq.yml		faq.yml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural speaker diarization with `pyannote.audio`

TL;DR

Highlights

Installation

Documentation

Benchmark

Citations

Support

Development

Test

About

Releases

Packages

Languages

License

aashish-19/pyannote-audio

Folders and files

Latest commit

History

Repository files navigation

Neural speaker diarization with pyannote.audio

TL;DR

Highlights

Installation

Documentation

Benchmark

Citations

Support

Development

Test

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Neural speaker diarization with `pyannote.audio`

Packages