Skip to content

Latest commit

 

History

History
22 lines (21 loc) · 1.58 KB

README.md

File metadata and controls

22 lines (21 loc) · 1.58 KB

mexca-sd-experiment

A repository for comparing potential speaker diarization tools to be used in the MEXCA pipeline.

Structure

The repository contains subdirectories for different parts of the experiment:

  • speaker-diarization\: Contains all files for the speaker diarization part
    • embeddings\: Contains the encoded speaker embeddings as .pt files
    • results\: Contains the .rttm files with speaker annotations
    • clustering.py: Script for clustering the speaker embeddings and assigning the speaker labels to speaker segments
    • sd_*.py: Scripts for applying the respective speaker encoding models
    • compare_sd.ipynb: Notebook for comparing the speaker diarization approaches
    • speaker_diarization.py: Script to run all speaker encoding scripts after each other
    • speaker_representation.py: Helper functions for performing speaker diarization
  • voice-activity-detection\: Contains all files for the voice activity detection part
    • results\: Contains the .rttm files with speech segments
    • compare_vad.ipynb: Notebook for comparing the voice activity detection approaches
    • custom.conf: Configuration file for the opensmile feature extractor
    • opensmile_helper_functions: Helper functions for extracting opensmile voice activity features
    • vad_*.py: Scripts for applying the voice activity detection models
  • explore_ami_corpus.ipynb: Notebook for exploring the properties of the AMI corpus
  • rttm.py: Functions for creating, reading, modifying, and writing .rttm files and objects
  • rttm_test.py: Preliminary test suite for rttm.py