Skip to content

This repository contains an implementation of training the Whisper Automatic Speech Recognition (ASR) model using the LibriSpeech dataset. The Whisper model is part of the Whisper ASR system, designed for speech synthesis.

License

Notifications You must be signed in to change notification settings

Aktharnvdv/whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Whisper: Speech Synthesis with Whisper ASR

This repository contains an implementation of training the Whisper Automatic Speech Recognition (ASR) model using the LibriSpeech dataset. The Whisper model is part of the Whisper ASR system, designed for speech synthesis.

Requirements

Install the required libraries using the following command:

pip install torch torchaudio git+https://github.com/snakers4/whisper

Dataset

The code uses the LibriSpeech dataset for training. It automatically downloads the specified split (e.g., "test-clean") and preprocesses the audio data.

Usage Clone the repository:

git clone https://github.com/Aktharnvdv/whisper.git
cd whisper

Install the required libraries as mentioned above.

Run the training script:

python train_whisper.py

Configuration You can customize the training configuration, such as the batch size, number of workers, learning rate, and model dimensions, by modifying the corresponding variables at the beginning of the script.

Model

The Whisper model is initialized with the specified dimensions in the Model Dimensions class. The model is trained using the LibriSpeech dataset for a specified number of epochs.

Training

The train_whisper function initializes the dataset, data loader, model, and optimizer. It then trains the Whisper model for the specified number of epochs.

Results

After training, the Whisper model's state dictionary is saved to a file named whisper_model.pth.

About

This repository contains an implementation of training the Whisper Automatic Speech Recognition (ASR) model using the LibriSpeech dataset. The Whisper model is part of the Whisper ASR system, designed for speech synthesis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published