This project implements a GSM 06.10 Full-Rate Speech Codec based on the Regular Pulse Excitation (RPE-LTP) algorithm. The codec is capable of encoding and decoding speech audio, transforming it into a compressed bitstream and reconstructing it back to speech.
- ✅ Short-Term Linear Predictive Coding (LPC) for speech signal modeling
- ✅ Long-Term Prediction (LTP) for pitch tracking and residual analysis
- ✅ Regular Pulse Excitation (RPE) for efficient speech compression
- ✅ Bitstream Formation (260-bit frames) for transmission
- ✅ WAV file support for input and output
- ✅ Visualization of results via Matplotlib
GSM 06.10 is a speech compression standard used in digital cellular systems. It processes audio in 160-sample frames (20ms at 8kHz sampling rate) and achieves full-rate speech encoding while maintaining quality suitable for telecommunication applications.
📂 GSM-06.10-Codec
├── 📄 README.md # This file
├── 📜 encoder.py # Encoding process (LPC, LTP, RPE)
├── 📜 decoder.py # Decoding process (inverse LPC, excitation reconstruction)
├── 📜 utils.py # Helper functions for signal processing
├── 📜 hw_utils.py # Utility functions for coefficient transformations
├── 📜 demo1.py # Basic encoding/decoding demonstration
├── 📜 demo2.py # Extended demonstration with enhanced features
├── 📜 demo3.py # Full feature demonstration and performance evaluation
└── 🎵 ena_dio_tria.wav # Example input audio file
Make sure you have Python 3.x and the required dependencies installed:
pip install numpy scipy matplotlib bitstring
To process an audio file using the GSM 06.10 codec:
python demo3.py
This will:
- Read the input
.wav
file - Encode it using GSM 06.10
- Decode it back to a
.wav
file - Plot the original vs reconstructed waveform
- Uses Linear Predictive Coding (LPC) to model speech
- Converts LPC coefficients to Log-Area Ratios (LAR)
- Encodes LAR using quantization
- Estimates pitch period (N) and gain factor (b)
- Predicts residuals based on prior speech frames
- Selects sub-sequence with maximum energy
- Quantizes excitation signal
- Encodes into a 260-bit frame
- Dequantizes & reconstructs excitation signal
- Applies inverse filtering to retrieve speech
- Synthesizes final speech waveform