A beginner-friendly machine learning project that predicts the risk of student burnout based on lifestyle and behavioral patterns such as sleep, study hours, screen time, exercise, and mood.
This project demonstrates the complete basic machine learning workflow including dataset generation, model training, evaluation, and prediction using Python.
Student burnout is a common issue caused by excessive workload, lack of sleep, stress, and unhealthy study habits. Cognivia attempts to model these factors and predict the likelihood of burnout using a machine learning algorithm.
The system analyzes several lifestyle variables and predicts the burnout level as:
- Low
- Medium
- High
The model is trained on a synthetic dataset generated using Python and evaluated using a Random Forest classifier.
- Synthetic dataset generation for student lifestyle patterns
- Machine learning model training using Random Forest
- Burnout risk prediction from user input
- Terminal-based interactive prediction system
- Organized machine learning project structure
- Beginner-friendly implementation
The project follows a typical machine learning workflow:
Dataset Generation → Data Preparation → Model Training → Model Evaluation → Prediction System
The dataset contains 500 simulated student records generated programmatically.
Each record includes the following features:
| Feature | Description |
|---|---|
| sleep_hours | Number of hours the student sleeps |
| study_hours | Hours spent studying per day |
| screen_time | Hours spent on phone/computer |
| exercise | Whether the student exercised (0 = No, 1 = Yes) |
| mood | Self-reported mood level (1–5 scale) |
| burnout | Burnout risk category (Low, Medium, High) |
Example dataset entry:
sleep_hours: 4
study_hours: 9
screen_time: 10
exercise: 0
mood: 2
burnout: High
The project uses the Random Forest Classifier from Scikit-Learn.
Random Forest works by creating multiple decision trees and combining their predictions to improve accuracy and reduce overfitting.
This algorithm was chosen because it:
- Handles structured data well
- Is beginner friendly
- Provides strong prediction performance
After training and testing on the dataset:
Model Accuracy: 0.95 (95%)
This means the model correctly predicted burnout levels for approximately 95% of the test data.
Note: The dataset is synthetically generated, which results in clearer patterns and higher accuracy compared to real-world datasets.
git clone https://github.com/angelabera/Cognivia
cd Cogniviapython -m venv venvWindows (PowerShell / Command Prompt)
venv\Scripts\activatemacOS / Linux
source venv/bin/activatepip install -r requirements.txtRun the following command:
python ml/generate_dataset.pyThis will create the dataset:
data/burnout_dataset.csv
python ml/train_model.pyThis will train the Random Forest model and save it as:
ml/burnout_model.pkl
python ml/predict.pyExample interaction:
- Enter sleep hours: 4
- Enter study hours: 9
- Enter screen time: 10
- Exercise (0=no,1=yes): 0
- Mood (1–5): 2
Output:
Predicted Burnout Level: High
If you'd like to explore the generated data visually, run the visualization script after creating the dataset.
python ml/visualize_data.pyThe script loads data/burnout_dataset.csv and opens a scatter plot showing sleep_hours vs study_hours. See the code at ml/visualize_data.py.
Note: Ensure data/burnout_dataset.csv exists (run the dataset generator first).
- Python
- Pandas
- NumPy
- Scikit-Learn
- Matplotlib
- Machine Learning
This project demonstrates key machine learning concepts such as:
- Dataset generation and handling
- Feature selection
- Supervised learning
- Model training and evaluation
- Prediction systems
- Project structuring for machine learning workflows
Angela Bera