transcriber is a Python tool that produces text transcriptions from MP3 audio files containing English speech. The actual speech-to-text operation is handled by the AWS Transcribe service. All this tool does is push your speech files to an S3 bucket in the cloud from your local computer, trigger a transcription job, and then pull down the text transcription once the job is complete.
As seen here: Transform Speech to Text with Python and AWS.
Configure a Python virtual environment at the root of the project:
python3 -m venv .venv
Enable the virtual environment:
source .venv/bin/activate
Install all development dependencies:
pip install -r src/requirements.txt
Build the project into a Python wheel and install:
python -m build -o dist src
pip install dist/*.whl
Alternatively, install the Python package in development mode:
pip install -e src
Verify the installation by printing the package version:
transcribe.py --version
First, ensure AWS boto3 package is configured properly: quickstart.
For Debian/ Ubuntu:
sudo apt install awscli
aws configure
Next, create an S3 bucket if you don't have one already. For the following configuration steps we'll assume that the bucket name is my-transcription-bucket. Replace this string with your bucket name of choice.
Last, assuming that your CLI account doesn't already have administrator access, grant said account access to the S3 bucket by adding an IAM access policy similar to the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject",
"s3:DeleteObject",
"s3:PutObject",
"s3:PutObjectAcl"
],
"Resource": [
"arn:aws:s3:::my-transcription-bucket",
"arn:aws:s3:::my-transcription-bucket/*"
]
}
]
}
To use this Python transcription tool, a configuration file is needed. This file is essential to define the AWS S3 bucket details and specify the paths for your audio and transcript files, both locally and in the S3 bucket. Here is the complete default configuration file for the tool:
[bucket]
bucket_name =
audio_prefix = audio
transcripts_prefix = transcripts
[local]
audio_dir = audio
transcripts_dir = transcripts
Let's break down the different sections and options in the configuration file:
This section is for specifying details about your AWS S3 bucket and where to store your files within the bucket.
-
bucket_name: Mandatory. You need to insert the actual name of your AWS S3 bucket (e.g.my-transcription-bucket). -
audio_prefix: Optional. The key prefix for where to store the audio files in the bucket. The default isaudio, meaning that by default, your audio files will be stored in a 'folder' namedaudioin your bucket. -
transcripts_prefix: Optional. The key prefix for where to store the output transcript files in the bucket. The default istranscripts, meaning that by default, your transcriptions will be stored in a 'folder' namedtranscriptsin your bucket.
This section is for specifying the locations of your audio and transcript files on your local machine.
-
audio_dir: Optional. This is the relative or absolute path to where the speech files are located on your local machine. The default is a directory namedaudioin the current working directory. -
transcripts_dir: Optional. This is the relative or absolute path to where to save the output transcript files on your local machine. The default is a directory namedtranscriptsin the current working directory.
This configuration file is pretty flexible and allows you to organize your files in a way that best suits your workflow. For all the examples in the subsequent sections, we'll assume that the configuration file is named transcriber.conf and is located in the current working directory.
The tool operates from the command line and primarily has two modes of operation: 'push' and 'pull'. Here is a quick guide to understanding and using these commands effectively. To print the script usage string at any time, call:
transcribe.py -h
Push mode is used to upload your audio files from your local directory to the specified S3 bucket for transcription. To use this mode, type:
transcribe.py -c transcriber.conf push
If you want to specify a particular audio file to push, you can use the -n or --name option followed by the file name:
transcribe.py -c transcriber.conf push -n input.mp3
Pull mode is used to download the transcribed text files from the S3 bucket back to your local directory after the transcription is completed. To use this mode, type:
transcribe.py -c transcriber.conf pull
If you want to specify a particular transcription file to pull, you can use the -n or --name option followed by the file name (with a .json extension):
transcribe.py -c transcriber.conf pull -n output.json
The -v or --verbose option can be used to enable verbose logging:
transcribe.py -v
To print the version of the tool and exit, use the --version option:
transcribe.py --version
This project is licensed under the terms of the MIT License. Please see the LICENSE file for full details.