Skip to content

gator-ryan/Student-Performance-Prediction

Repository files navigation

Student Performance Prediction 📊🎓

Predicting student academic performance using machine learning techniques. This project demonstrates a complete end-to-end ML pipeline — from data preprocessing, feature engineering, model training & hyperparameter tuning, to deployment with a web application.

📌 Table of Contents

Overview

Project Features

Tech Stack

Dataset

Project Structure

Machine Learning Pipeline

Results

Setup & Installation

How to Run

Web Application

Future Improvements

Contributing

License

🚀 Overview

The goal is to build a machine learning model that predicts student performance based on demographic, social, and academic attributes.

Problem type → Supervised Learning (Classification & Regression experiments possible).

Objective → Predict final grade / pass-fail outcome of students.

Approach → Data preprocessing → Feature engineering → Model training → Hyperparameter tuning → Model evaluation → Deployment.

🌟 Project Features

Clean & modular ML pipeline (src/).

Multiple algorithms tested:

Logistic Regression

Random Forest

Gradient Boosting

XGBoost

CatBoost

Hyperparameter tuning using GridSearchCV / RandomizedSearchCV.

Performance comparison of all models.

Model interpretability using feature importance.

Deployed via a Flask application (application.py) for user input and prediction.

Packaged with setup.py for installability.

Complete logging & exception handling.

🛠 Tech Stack

Programming Language: Python 3.9+

Libraries & Tools:

pandas, numpy, matplotlib, seaborn

scikit-learn

xgboost, catboost

Flask (for deployment)

joblib / pickle (for model persistence)

📂 Dataset

Dataset: Student Performance Dataset (UCI ML Repository)

Attributes include:

Demographics (age, gender, family background)

Academic records (study time, failures, absences, grades)

Social factors (internet access, family support, activities)

Target variable: Final Grade (G3) or pass/fail indicator.

🏗 Project Structure mlproject/ │ ├── data/ # (Optional) Store raw/processed data ├── notebook/ # Jupyter notebooks for EDA & experiments ├── src/ # Source code for ML pipeline │ ├── components/ # Data ingestion, transformation, trainer │ ├── pipeline/ # Training & prediction pipeline scripts │ ├── logger.py # Logging utility │ ├── exception.py # Custom exception handling │ └── utils.py # Helper functions │ ├── artifacts/ # Saved models, preprocessor, reports ├── application.py # Flask app entry point ├── requirements.txt # Dependencies ├── setup.py # Project install config ├── README.md # Project documentation └── LICENSE # License file

🔄 Machine Learning Pipeline

Data Ingestion

Load dataset

Handle missing values

Train-test split

Data Transformation

Encoding categorical variables

Scaling numerical features

Feature selection

Model Training

Train multiple algorithms

Hyperparameter tuning

Cross-validation

Model Evaluation

Accuracy, Precision, Recall, F1-score, ROC-AUC (classification)

RMSE, MAE, R² (regression)

Model Selection

Choose best-performing model based on validation results.

Deployment

Save model using pickle/joblib

Expose Flask API for predictions

📊** Results** Model Accuracy F1-score Notes Logistic Regression 78% 0.77 Baseline model Random Forest 85% 0.84 Good performance XGBoost 87% 0.86 Best model CatBoost 86% 0.85 Competitive

⚙️ Setup & Installation

Clone the repository:

git clone https://github.com/gator-ryan/mlproject.git cd mlproject

Create a virtual environment & activate:

python -m venv venv source venv/bin/activate # On Linux/Mac venv\Scripts\activate # On Windows

Install dependencies:

pip install -r requirements.txt

Install the project as a package:

pip install -e .

▶️ How to Run

Train model pipeline:

python src/pipeline/train_pipeline.py

Make predictions:

python src/pipeline/predict_pipeline.py

Run the web app (Flask):

python application.py

Visit → http://127.0.0.1:5000

🌐** Web Application**

Flask app accepts student details as input via web form.

Returns predicted student performance score / pass-fail outcome. (Can be extended with Streamlit for interactive dashboards.)

🚧** Future Improvements**

Add Docker support for containerized deployment.

Integrate CI/CD pipeline (GitHub Actions, AWS/GCP/Azure).

Add experiment tracking (MLflow, Weights & Biases).

Deploy on Heroku / AWS Elastic Beanstalk / Streamlit Cloud.

Improve feature engineering with domain-specific insights.

Add SHAP/LIME plots for interpretability.

🤝 Contributing

Contributions are welcome!

Fork the repository

Create your feature branch (git checkout -b feature/AmazingFeature)

Commit your changes (git commit -m 'Add some AmazingFeature')

Push to the branch (git push origin feature/AmazingFeature)

Open a Pull Request

📜** License**

This project is licensed under the MIT License. See LICENSE for details.