zoom-assistant

Web application built with Python3.10 and Vosk for transcribing audio from microphone on the fly and detecting questions (Russian language), useful for meetings.

Features:

recording speakers audio data and preparing speakers pool
online microphone stream processing with transcription and speaker detection
export of finished audio sessions (metadata + detected questions)

Requirements

Vosk prototype

Docker, x86_64/amd64 architecture, 16+ GB RAM (vosk server models use at least 8 GB)

For manual run - Python3.10, installed MongoDB, Vosk Server, and optionally Node, if you wish to rebuild js bundles

Whisper + pyannote prototype

Python3.10

Running Vosk-based application

env vars

VOSK_SERVER_WS_URL - url where vosk websocket server is started, default is ws://localhost:2700
MONGODB_URI - url to mongodb, default is mongodb://localhost:27017
MONGO_VOSK_DB_NAME - name of mongo database, default is nir-zoom
MONGO_SPEAKERS_COL_NAME - name for collection with speaker records, default is speakers
MONGO_SESSIONS_COL_NAME - name for collection with meetings records, default is sessions
GOOD_SPK_FRAMES_NUM - value for analyzing quality of recorded speaker features, default is 300
MIN_SPK_VECTORS_NUM - value for checking quantity of recorded speaker features, default is 8
SPK_GOOD_RATIO - value restricting min border of ratio = (num good speaker features / num all speaker features), default is 0.65
MERGE_DIFF_SEC - value in secs - how close should be two phrases to be combined into one, default is 2.5 s

Running using docker

Prepare .env file with needed variables, except for VOSK_SERVER_WS_URL and MONGODB_URI, they would be set automatically because specific images of Vosk Server and MongoDB would be used
Run docker-compose up -d
Navigate to http://localhost:3030

Running without docker

Install dependencies using pip install -r requirements.txt. If you plan to run Vosk Server code manually, install vosk deps using pip install -r vosk-requirements.txt
Check that Vosk Server and MongoDB are up and running, for Vosk Server you can use supersolik/vosk-ru-spk:latest docker image, or download vosk models - ru model and spk model and run server code manually
Set described env vars, required in this case are VOSK_SERVER_WS_URL and MONGODB_URI to point app to your running Vosk Server and MongoDB
Run uvicorn backend:app.app --host 127.0.0.1 --port 3030
Navigate to http://localhost:3030

endpoint docs

REST endpoints docs available at GET /docs endpoint, navigate using browser

Also app contains two websocket endpoints: /spk/ws and /meeting/ws, responsible for processing audio chunks sent from client using websockets for speaker and meeting sessions.

Using Vosk-based application

Main page

To record speaker, click on the Record speakers link
To record meeting session, click on the Record meeting link
To view and export recorded meetings, click on the Export meeting stats link

Speaker recording

Input speaker name in the form and click Set speaker button, app would initialize a speaker recording session, and recording button will become available. Name cannot be changed, if you made a typo you would need to reload page and create new speaker
Click on the Start recording button and allow microphone usage in the following broweser prompt
Wait for 1 sec and start speaking, loud and clear. Try to speak in long sentences, with pauses between sentences for about 2 seconds. Total speaking time should be about 1 - 2 minutes
When you're done, click on the Stop recording button. App will analyze recorded data and display total time, along with quality and quantity of recorded data. If recording is not quite good, there will be recommendations for improvment
Recording complete, if you wish to record another speaker, simply reload the page or click Record another speaker button (will appear after recording is stopped).

if you want to record different data for the same speaker, input the same name in next recording session, data for previous recording would be overwritten.

Audio session recording

Input meeting name in form and select speakers from dropdown multiselect with checkboxes. After click on the Set meeting data button to initilize a meeting session.
Click Start recording and allow microphone usage in the following broweser prompt, then after 1 sec app would start processing microphone audio data on the fly and display detected phrases with speaker name in the appeared textarea below buttons.
When you wish to stop processing and finish recording microphone data, click Stop recording button. App will analyze recorded data and display total time, along with info about how many speakers have been actually detected (from those you've chosen at the start)
The Export button will appear, if you wish to export meeting stats right away. If you don't, you can always view and export stats at export page.
Recorging is done, you can start another audio session or close the page

App will add a timestamp to each session name, so same names can be used for different sessions

App will use the microphone which is being used by the browser, so if you wish to change micro, you will have to do it in browser or OS settings

View and export recorded sessions

Page support pagination, page shows 8 records, when sessions number will be over 8, the controls for navigating between different pages will appear around Page N text.

To export any record, just click Export, data will be exported in zip archive with 2 csv files - metadata.csv will contain data shown in table, questions.csv will contain speakers questionta daa with timestamps in seconds relative to start of the recording

Code notes

Js bundles

go to ./frontend_scripts
run npm install --include dev
to build js bundles, run npm run build_speaker and npm run build_meeting

Vosk utils

Speaker detection is based on cosine distance between detected speaker features vector (using Vosk) and pre-recorded features vectors - for each speaker set the mean distance is calculated and speaker with minimal mean distance is being picked
Question detecting is rule-based, rules are gathered in QUESTION_RULES array at the top of vosk_utils/__init__.py file, each rule is the boolen function, with checks keywords entry in passed text
Forbidden ("bad") words are being stored as pickled python list in bad_words.pkl in vosk_utils folder, if you wish to extend this list, you need to unpicke this file, extend python list with needed words and pickle it again. Bad words are being simply erased for recognized phrases
Questions during export are being calculated in following algo: if rule-based algo determined the phrase is a question, at most 5 next subsequent phrases from same speaker (without interruption from other speakers) would be added to the question text. The logic behind this is that rule-based algo looking to the start of the question, but following phrases even after a pauses are usually still a part of the question, even if they don't contain explicit question keywords

CLI app

Vosk-based

Activate virtual environment with Python3.10, then install dependencies using:

pip install -r requirements.txt
pip install -r vosk-requirements.txt

Use python3.10 main.py --help to invoke detailed description:

Whisper + pyannote

Activate virtual environment with Python3.10, then install dependencies using:

pip install -r whisper-pyannote-requirements.txt

Use python3.10 main_whisper.py --help to invoke detailed description on how to run

outdated info

Link to screencast with first results (19.03.2022 - cli app) - https://drive.google.com/file/d/1MQdnaoQoiWK9L1MZP185QtTg8CifSfz9/view?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
backend		backend
experiments_code		experiments_code
frontend_scripts		frontend_scripts
models		models
nir_docs		nir_docs
question_detection/ru		question_detection/ru
result_examples		result_examples
test_asr_server		test_asr_server
utils		utils
vosk_utils		vosk_utils
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Dockerfile.vosk-spk-ru		Dockerfile.vosk-spk-ru
README.md		README.md
docker-compose.yaml		docker-compose.yaml
main.py		main.py
main_whisper.py		main_whisper.py
requirements.txt		requirements.txt
vosk-requirements.txt		vosk-requirements.txt
whisper-pyannote-requirements.txt		whisper-pyannote-requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zoom-assistant

Requirements

Vosk prototype

Whisper + pyannote prototype

Running Vosk-based application

env vars

Running using docker

Running without docker

endpoint docs

Using Vosk-based application

Main page

Speaker recording

Audio session recording

View and export recorded sessions

Code notes

Js bundles

Vosk utils

CLI app

Vosk-based

Whisper + pyannote

outdated info

About

Releases

Packages

Languages

SuperSolik/audio-transcriber

Folders and files

Latest commit

History

Repository files navigation

zoom-assistant

Requirements

Vosk prototype

Whisper + pyannote prototype

Running Vosk-based application

env vars

Running using docker

Running without docker

endpoint docs

Using Vosk-based application

Main page

Speaker recording

Audio session recording

View and export recorded sessions

Code notes

Js bundles

Vosk utils

CLI app

Vosk-based

Whisper + pyannote

outdated info

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages