A machine learning system for predicting diabetes risk from voice recordings using federated learning and advanced audio feature extraction.
DiaVoc is an innovative AI-powered system that analyzes voice patterns to predict diabetes risk. The system leverages self-supervised learning with BYOL-S (Bootstrap Your Own Latent) for robust audio embeddings, combined with federated learning to ensure privacy-preserving model training across distributed clients.
- Voice-Based Diabetes Prediction: Analyzes voice recordings to assess diabetes risk
- Federated Learning: Privacy-preserving training across multiple clients without sharing raw data
- Advanced Audio Processing: Uses BYOL-S embeddings for state-of-the-art voice feature extraction
- Web API: FastAPI-based REST API for easy integration
- Comprehensive Evaluation: Multiple ML models with thorough performance metrics
- Non-IID Data Handling: Robust partitioning for realistic federated scenarios
Voice-to-Diabetes-main/
├── src/ # Core source code
│ ├── training.py # Federated learning training pipeline
│ ├── infer.py # Inference system for predictions
│ ├── processing_pipeline.py # Data preprocessing and feature engineering
│ └── generate_embedding.py # Audio embedding generation
├── app/ # Web application
│ ├── main.py # FastAPI server
│ └── inference.py # API inference logic
├── fl/ # Federated learning components
│ ├── fl_simulation.py # FL simulation and aggregation
│ ├── data_loader.py # Data loading utilities
│ ├── predict.py # Prediction utilities
│ └── benchmark_models.py # Model benchmarking
├── models/ # Pre-trained models and artifacts
│ ├── serab-byols/ # BYOL-S audio encoder
│ ├── scaler.pkl # Feature scaler
│ ├── pca.pkl # PCA transformer
│ └── global_model_improved.pkl # Trained global model
├── data/ # Data files
│ ├── male_embeddings.pkl # Male voice embeddings
│ ├── female_embeddings.pkl # Female voice embeddings
│ └── *.wav # Sample audio files
├── README.md # This file
└── training_history.png # Training visualization
- Python 3.8+
- PyTorch
- CUDA (optional, for GPU acceleration)
Install the required packages:
pip install numpy pandas scikit-learn torch librosa fastapi uvicorn joblib matplotlib seaborn- Clone the repository:
git clone <repository-url>
cd Voice-to-Diabetes-main- Install dependencies:
pip install -r requirements.txt # If available, or install manually as above- Ensure data files are in place:
- Place
male_embeddings.pklandfemale_embeddings.pklin thedata/directory - Ensure BYOL-S checkpoints are available in
models/serab-byols/checkpoints/
- Place
Run the federated learning training pipeline:
python src/training.pyThis will:
- Load and preprocess voice embeddings
- Simulate federated learning across 5 clients
- Train and aggregate models using FedAvg
- Save the global model to
models/
For command-line inference:
python src/infer.pyStart the FastAPI server:
python app/main.pyThe API will be available at http://localhost:8000
POST /predict: Predict diabetes risk from voice recording- Parameters:
audio(file),age(int),gender(str),bmi(float),ethnicity(str)
- Parameters:
Example usage:
import requests
files = {'audio': open('sample.wav', 'rb')}
data = {'age': 45, 'gender': 'male', 'bmi': 28.5, 'ethnicity': 'asian'}
response = requests.post('http://localhost:8000/predict', files=files, data=data)
print(response.json())To preprocess new data:
python src/processing_pipeline.pyThis generates the necessary preprocessing artifacts (scaler, PCA) saved in models/.
Run FL simulations:
python fl/fl_simulation.py- Uses BYOL-S (Bootstrap Your Own Latent) for self-supervised audio representation learning
- Extracts 2048-dimensional embeddings from voice recordings
- Combines with demographic features (age, BMI, gender, ethnicity)
- Implements FedAvg algorithm with true parameter averaging
- Non-IID data partitioning using Dirichlet distribution (α=0.5)
- 5 clients simulation with local training rounds
- Global model aggregation for improved generalization
- MLP Classifier as primary model
- Feature engineering with Risk Index (Age × BMI interaction)
- PCA dimensionality reduction to 100 components
- Standard scaling for numerical features
The system achieves:
- 80%+ accuracy on held-out test sets
- Robust performance across demographic groups
- Privacy preservation through federated learning
- Real-time inference capability
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this work in your research, please cite:
@misc{diavoc2026,
title={DiaVoc: Voice-based Diabetes Prediction using Federated Learning},
author={Khati, Harina and Pathak, Samriddha and Pokhrel, Jyoti and Subedi, Rasum},
year={2026},
howpublished={GitHub repository}}
}
Samriddha Pathak: +977 9702187444