Library for automated forced alignment of given Swiss german audio and German transcript.
Practical part if IP619bb_I4DS02.
- Python version: 3.6
- And the following libraries:
- prettytable==0.7.2
- pydub==0.23.1
- numpy==1.18.1+mkl
- nltk==3.4.5
- Bio==0.1.0
- PyYAML==5.3
- google==2.0.3
- unittest_data_provider==1.0.1
- memory_profiler==0.55.0
See ./requirements.txt
for a list generated by pireqs.
- Clone this repository
- Install dependencies (see Requirements )
- Download the output of a given Google STT execution (must be JSON) and place it in the same folder as your audio and transcript files.
- Make sure all file groups follow this naming convention and all mentioned files are present:
For instance:
[source_file_name].txt # The transcript [source_file_name].wav # Mono-channel WAV file [source_file_name].flac # OPTIONAL: Mono-channel FLAC file, only needed for generating the google output. [source_file_name]_google_output.json # Output generated by google [source_file_name]_audacity_hand.txt # OPTIONAL: Hand alignment
gemeinde_stadthausen_123.txt gemeinde_stadthausen_123.wav gemeinde_stadthausen_123_google_output.json
- Copy and alter
./config.example.yml
to your needs (See Configuration ) - Generate alignments as needed using the CLI commands (See CLI commands )
aligner_type: [ basic | random | google_biopython | google_global_character | google_global_word | google_semiglobal_character | google_semiglobal_word | google_local_character | google_local_word ]
algorithm:
match_reward: int
mismatch_penalty: int
gap_penalty: int
optimize_params_formula: string
no_appearance:
type: [ character | time ]
interval_length: float
score_weights:
gaps_google: float
gaps_transcript: float
alignment_score: float
google_confidence: float
filtering:
threshold: float
method: [ mark | delete ]
The key optimize_params_formula
takes any valid Python statement to calculate a score that is minimized against. The following variables can be used:
iou
: The mean IOU valuedeviation
: The mean deviation in secondsf1
: F1-Score of the alignment (classification if sentences do appear)precision
: Precision of the alignment (classification if sentences do appear)recall
: Recall of the alignment (classification if sentences do appear)
The following CLI commands are available and should be executed as python ./bin/{scriptName}
from project root:
Create alignment
----------
Creates an alignment based on configuration. See README.md for setting up a correct configuration.
Usage:
python create_alignment.py --path=<path> --config=<path> [-v|-vv|-vvv]
Args:
--path: Path to read raw data from and write alignments to
--config: Path to configuration
-v|-vv|-vvv: Verbosity level of the output
-h: Prints this help
Compare alignments
----------
Compares two kinds of alignments
Usage:
python compare_alignment.py --path=<path> --type1=basic,hand,random,google --type2=basic,hand,random,google [-v|-vv|-vvv] [--with-list] [--get-low-means] [--training-only]
Args:
--path: Path to read alignment data
--type1: First type to compare, one of basic, hand, random or google
--type2: Second type to compare, one of basic, hand, random or google
-v|-vv|-vvv: Verbosity level of the output
--with-list: Include a list with all calculated IOUs for copy/paste (to use in an EXCEL sheet, for example)
--get-low-means: Includes a list of wav files with a mean IOU < 0.3, for debugging purposes
--training-only: Only ever compares sentences marked with [TRAINING] in the first type of the alignment
-h: Prints this help
Get Google recognition
----------
Gets the Speech Recognition result of Google Cloud API and stores it in a caching folder.
Usage:
python get_google_recognition_raw.py --path=<path> --authpath=<path> --bucket=<bucket name> --outpath=<path> [-v|-vv|-vvv]
Args:
--path: Path to read transcript files from (needed to filter which files to actually transcript)
--authpath: Path containing the authentication files necessary to connect to Google Cloud API services
--bucket: Name of the bucket containing all FLAC files
--outpath: Path to write the raw JSON output to
-v|-vv|-vvv: Verbosity level of the output
-h: Prints this help
Fix hand alignments
----------
Fix hand alignments: Reshuffle training data and/or assign `-` to nonexisting sentences.
Usage:
python fix_hand_alignments.py --path=<path> [-v|-vv|-vvv] [--fix-nonexisting] [--reshuffle-training]
Args:
--path: Path to read alignment data
-v|-vv|-vvv: Verbosity level of the output
--fix-nonexisting: If non-existing sentences should be marked with `-` for interval start and end points
--reshuffle-training: Select a new 70% of all sentences as training data
-h: Prints this help
Optimize alignments
----------
Tries to find the best alignment parameters based on Bayesian optimization.
Usage:
python optimize_parameters.py --path=<path> --config=<path> [-v|-vv|-vvv]
Args:
--path: Path to read alignment data from
--config: Path to configuration
--convergence-plot-file: Filename for the plot of the convergence
--acquisition-plot-file: Filename for the plot of the acquisition (if possible to create)
-v|-vv|-vvv: Verbosity level of the output
-h: Prints this help
Optimize score
----------
Tries to find the best parameters for overall score based on Bayesian optimization.
Usage:
python optimize_score.py --path=<path> --config=<path> [-v|-vv|-vvv]
Args:
--path: Path to read alignment data from
--config: Path to configuration
--convergence-plot-file: Filename for the plot of the convergence
--acquisition-plot-file: Filename for the plot of the acquisition (if possible to create)
-v|-vv|-vvv: Verbosity level of the output
-h: Prints this help
For detailed memory usage, the package memory_profiler is used. The code contains several annotations to measure memory usage.
To measure, execute an arbitrary CLI command with the memory_profiler module:
python -m memory_profiler ./bin/{command and args}
The annotation @profile
can be used to measure memory usage of several methods and functions at once. The application has several of them at various spots, commented out. If you want to profile a specific function, uncomment the annotation right before the method/function signature.
Is measured in the code itself and output with a necessary verbosity of 0.
To execute all unit tests, execute the following in project root:
python -m unittest discover .
To have a better understanding as of how IOU works, the file iou_visualizer.html
contains a HTML/CSS/JS implementation of the IOU with two range sliders and visualization of the actual calculation.
MIT, see LICENSE.md