Skip to content

Latest commit

Β 

History

History

clustering

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Β 
Β 
Β 
Β 
Β 
Β 

Clustering from Speaker Embeddings

ℹ️ Based on https://github.com/pyannote/pyannote-audio.

Usage

$ python cluster.py $HOME/Downloads/mclip /tmp/outdir

Description

This script mainly extracts speaker embeddings (features) from lots of audio files and decides which audios belong to the same speaker. This task is known as clustering, as is often applied as the last step of speaker diarization.

The script receives two directories as args. The restriction is that the input dir must contain at least one subdir, either for a male or female speaker, which are expected to come from inaSpeechSegmenter (inaSS.) Subdirectories must contain a pair of audio-transcription files (namely .wav and .txt extensions, the script will sanity-check it.) The output dir will not be wiped out at each run, so be careful to remove it before execution while debugging.

The expected format of the subdirs within the input directory is as follows: two IDs separated by an underscore char, the second ID starting with a gender ID (M of F.)

<BROADCASTER_ID>_<GENDER_ID><YMD_DATE_TAG>

e.g.:

andaiafm_M20201105
andaiafm_F20201105
$ tree $HOME/Downloads/mclip -C | head
$HOME/Downloads/mclip              $HOME/Downloads/mclip
β”œβ”€β”€ andaiafm_F20201105             β”œβ”€β”€ andaiafm_M20201105
β”‚Β Β  β”œβ”€β”€ mclip-00000003.txt         β”‚Β Β  β”œβ”€β”€ mclip-00000001.txt
β”‚Β Β  β”œβ”€β”€ mclip-00000003.wav         β”‚Β Β  β”œβ”€β”€ mclip-00000001.wav
β”‚Β Β  β”œβ”€β”€ mclip-00000004.txt         β”‚Β Β  β”œβ”€β”€ mclip-00000002.txt
β”‚Β Β  β”œβ”€β”€ mclip-00000004.wav         β”‚Β Β  β”œβ”€β”€ mclip-00000002.wav
β”‚Β Β  β”œβ”€β”€ mclip-00000005.txt         β”‚Β Β  β”œβ”€β”€ mclip-00000012.txt
β”‚Β Β  β”œβ”€β”€ mclip-00000005.wav         β”‚Β Β  β”œβ”€β”€ mclip-00000012.wav
β”‚Β Β  β”œβ”€β”€ mclip-00000006.txt         β”‚Β Β  β”œβ”€β”€ mclip-00000013.txt
β”‚Β Β  β”œβ”€β”€ mclip-00000006.wav         β”‚Β Β  β”œβ”€β”€ mclip-00000013.wav
...                                ...

Example output:

$ tree /tmp/outdir
/tmp/outdir                                     /tmp/outdir
└── andaiafm20201105                            └── andaiafm20201105
    β”œβ”€β”€ andaiafm20201105-F0001                      β”œβ”€β”€ andaiafm20201105-M0001
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000000.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000000.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000000.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000000.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000001.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000001.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000001.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000001.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000002.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000002.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000002.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000002.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000003.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000003.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000003.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000003.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000004.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000004.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000004.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000004.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000005.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000005.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000005.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000005.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000006.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000006.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000006.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000006.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000007.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000007.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000007.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000007.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000008.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000008.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000008.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000008.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000009.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000009.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000009.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000009.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000010.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000010.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000010.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000010.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000011.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000011.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000011.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000011.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000012.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000012.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000012.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000012.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000013.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000013.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000013.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000013.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000014.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000014.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000014.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000014.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0001_000015.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0001_000015.txt
    β”‚Β Β  └── andaiafm20201105F0001_000015.wav        ...
    β”œβ”€β”€ andaiafm20201105-F0002                      β”œβ”€β”€ andaiafm20201105-M0002
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0002_000016.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0002_000113.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0002_000016.wav        β”‚Β Β  └── andaiafm20201105M0002_000113.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0002_000017.txt        β”œβ”€β”€ andaiafm20201105-M0003
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0002_000017.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0003_000114.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0002_000018.txt        β”‚Β Β  └── andaiafm20201105M0003_000114.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0002_000018.wav        β”œβ”€β”€ andaiafm20201105-M0004
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0002_000019.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0004_000115.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0002_000019.wav        β”‚Β Β  └── andaiafm20201105M0004_000115.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0002_000020.txt        β”œβ”€β”€ andaiafm20201105-M0005
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0002_000020.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0005_000116.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0002_000021.txt        β”‚Β Β  └── andaiafm20201105M0005_000116.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0002_000021.wav        β”œβ”€β”€ andaiafm20201105-M0006
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0002_000022.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0006_000117.txt
    β”‚Β Β  └── andaiafm20201105F0002_000022.wav        β”‚Β Β  └── andaiafm20201105M0006_000117.wav
    β”œβ”€β”€ andaiafm20201105-F0003                      β”œβ”€β”€ andaiafm20201105-M0007
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0003_000023.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0007_000118.txt
    β”‚Β Β  └── andaiafm20201105F0003_000023.wav        β”‚Β Β  └── andaiafm20201105M0007_000118.wav
    β”œβ”€β”€ andaiafm20201105-F0004                      β”œβ”€β”€ andaiafm20201105-M0008
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0004_000024.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0008_000119.txt
    β”‚Β Β  └── andaiafm20201105F0004_000024.wav        β”‚Β Β  └── andaiafm20201105M0008_000119.wav
    β”œβ”€β”€ andaiafm20201105-F0005                      β”œβ”€β”€ andaiafm20201105-M0009
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000025.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000120.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000025.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000120.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000026.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000121.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000026.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000121.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000027.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000122.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000027.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000122.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000028.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000123.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000028.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000123.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000029.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000124.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000029.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000124.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000030.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000125.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000030.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000125.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000031.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000126.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000031.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000126.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000032.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000127.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000032.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000127.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000033.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000128.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000033.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000128.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000034.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000129.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000034.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000129.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0005_000035.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000130.txt
    β”‚Β Β  └── andaiafm20201105F0005_000035.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000130.wav
    β”œβ”€β”€ andaiafm20201105-F0006                      β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000131.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0006_000036.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000131.wav
    β”‚Β Β  └── andaiafm20201105F0006_000036.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000132.txt
    β”œβ”€β”€ andaiafm20201105-F0007                      β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000132.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0007_000037.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000133.txt
    β”‚Β Β  └── andaiafm20201105F0007_000037.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000133.wav
    β”œβ”€β”€ andaiafm20201105-F0008                      β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000134.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0008_000038.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000134.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0008_000038.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000135.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0008_000039.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000135.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0008_000039.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000136.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0008_000040.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000136.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0008_000040.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000137.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0008_000041.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000137.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0008_000041.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000138.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0008_000042.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000138.wav
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0008_000042.wav        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000139.txt
    β”‚Β Β  β”œβ”€β”€ andaiafm20201105F0008_000043.txt        β”‚Β Β  β”œβ”€β”€ andaiafm20201105M0009_000139.wav
    β”‚Β Β  └── andaiafm20201105F0008_000043.wav        ...

Issues

In theory, pyannote-audio solves the entire diarization problem, but it does not filter out music nor noise, so inaSS has a point there. A second issue is that inaSS uses TensorFlow as ML backend while pyannote uses PyTorch. It would be nice to unify. Third and last, both libs are far from perfect: frequent misalignments/missegmentations/misclusterings occur.

Requirements

$ pip install -r requirements.txt
  • pyannote.audio (includes scipy, numpy, etc.)
  • torch