Audio-Transcript-Anonymizer

This Pipeline accepts an audio or video file, transcribes the content using WhisperX and applies speaker diarization via Pyannote. It can be used for interviews, therapy sessions or conversations involving multiple speakers in general.

Features

Audio/video (mp3/mp4) input

Automatic transcription via WhisperX

Speaker diarization via Pyannote.audio

Installation

To run the transcription pipeline you'll need Python 3.10. and Anaconda.

Installing FFmpeg

Option 1: Via pip

pip install python-ffmpeg

Option 2: Via scoop

scoop install ffmpeg

Installing WhisperX

Follow the instruction from the WhisperX repository (see 'Setup'):

https://github.com/m-bain/whisperX?tab=readme-ov-file
Installing Pyannote

Via Pip:

pip install pyannote.audio

Running the script

Place the script into a folder along with the subfolders 'audios' (for mp3) and/or 'videos' (for mp4) and add your media to the respective folder.

Open the script and update all fields marked with *** and save your changes. Run the script.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
BRAT eval		BRAT eval
Dataset		Dataset
finetune_bert		finetune_bert
group_2_website		group_2_website
Pipeline.py		Pipeline.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Audio-Transcript-Anonymizer

Features

Installation

Running the script

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

deryaerman/Audio-Transcript-Anonymizer-TUB-AP

Folders and files

Latest commit

History

Repository files navigation

Audio-Transcript-Anonymizer

Features

Installation

Running the script

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages