Skip to content

humatic/transcriber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AWS Speech Transcriber

transcriber is a Python tool that produces text transcriptions from MP3 audio files containing English speech. The actual speech-to-text operation is handled by the AWS Transcribe service. All this tool does is push your speech files to an S3 bucket in the cloud from your local computer, trigger a transcription job, and then pull down the text transcription once the job is complete.

As seen here: Transform Speech to Text with Python and AWS.

Install

Configure a Python virtual environment at the root of the project:

python3 -m venv .venv

Enable the virtual environment:

source .venv/bin/activate

Install all development dependencies:

pip install -r src/requirements.txt

Build the project into a Python wheel and install:

python -m build -o dist src
pip install dist/*.whl

Alternatively, install the Python package in development mode:

pip install -e src

Verify the installation by printing the package version:

transcribe.py --version

AWS Setup

First, ensure AWS boto3 package is configured properly: quickstart.

For Debian/ Ubuntu:

sudo apt install awscli
aws configure

Next, create an S3 bucket if you don't have one already. For the following configuration steps we'll assume that the bucket name is my-transcription-bucket. Replace this string with your bucket name of choice.

Last, assuming that your CLI account doesn't already have administrator access, grant said account access to the S3 bucket by adding an IAM access policy similar to the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::my-transcription-bucket",
                "arn:aws:s3:::my-transcription-bucket/*"
            ]
        }
    ]
}

Configuration

To use this Python transcription tool, a configuration file is needed. This file is essential to define the AWS S3 bucket details and specify the paths for your audio and transcript files, both locally and in the S3 bucket. Here is the complete default configuration file for the tool:

[bucket]

bucket_name =
audio_prefix = audio
transcripts_prefix = transcripts

[local]

audio_dir = audio
transcripts_dir = transcripts

Let's break down the different sections and options in the configuration file:

[bucket]

This section is for specifying details about your AWS S3 bucket and where to store your files within the bucket.

  • bucket_name: Mandatory. You need to insert the actual name of your AWS S3 bucket (e.g. my-transcription-bucket).

  • audio_prefix: Optional. The key prefix for where to store the audio files in the bucket. The default is audio, meaning that by default, your audio files will be stored in a 'folder' named audio in your bucket.

  • transcripts_prefix: Optional. The key prefix for where to store the output transcript files in the bucket. The default is transcripts, meaning that by default, your transcriptions will be stored in a 'folder' named transcripts in your bucket.

[local]

This section is for specifying the locations of your audio and transcript files on your local machine.

  • audio_dir: Optional. This is the relative or absolute path to where the speech files are located on your local machine. The default is a directory named audio in the current working directory.

  • transcripts_dir: Optional. This is the relative or absolute path to where to save the output transcript files on your local machine. The default is a directory named transcripts in the current working directory.

This configuration file is pretty flexible and allows you to organize your files in a way that best suits your workflow. For all the examples in the subsequent sections, we'll assume that the configuration file is named transcriber.conf and is located in the current working directory.

Script Usage

The tool operates from the command line and primarily has two modes of operation: 'push' and 'pull'. Here is a quick guide to understanding and using these commands effectively. To print the script usage string at any time, call:

transcribe.py -h

Push Mode

Push mode is used to upload your audio files from your local directory to the specified S3 bucket for transcription. To use this mode, type:

transcribe.py -c transcriber.conf push

If you want to specify a particular audio file to push, you can use the -n or --name option followed by the file name:

transcribe.py -c transcriber.conf push -n input.mp3

Pull Mode

Pull mode is used to download the transcribed text files from the S3 bucket back to your local directory after the transcription is completed. To use this mode, type:

transcribe.py -c transcriber.conf pull

If you want to specify a particular transcription file to pull, you can use the -n or --name option followed by the file name (with a .json extension):

transcribe.py -c transcriber.conf pull -n output.json

Other Options

The -v or --verbose option can be used to enable verbose logging:

transcribe.py -v

To print the version of the tool and exit, use the --version option:

transcribe.py --version

License

This project is licensed under the terms of the MIT License. Please see the LICENSE file for full details.

About

Python tool that produces text transcriptions from MP3 audio files containing English speech.

Resources

License

Stars

Watchers

Forks

Languages