GitHub - gator-ryan/Student-Performance-Prediction: Student Performance Machine Learning Model

Student Performance Prediction 📊🎓

Predicting student academic performance using machine learning techniques. This project demonstrates a complete end-to-end ML pipeline — from data preprocessing, feature engineering, model training & hyperparameter tuning, to deployment with a web application.

📌 Table of Contents

Overview

Project Features

Tech Stack

Dataset

Project Structure

Machine Learning Pipeline

Results

Setup & Installation

How to Run

Web Application

Future Improvements

Contributing

License

🚀 Overview

The goal is to build a machine learning model that predicts student performance based on demographic, social, and academic attributes.

Problem type → Supervised Learning (Classification & Regression experiments possible).

Objective → Predict final grade / pass-fail outcome of students.

Approach → Data preprocessing → Feature engineering → Model training → Hyperparameter tuning → Model evaluation → Deployment.

🌟 Project Features

Clean & modular ML pipeline (src/).

Multiple algorithms tested:

Logistic Regression

Random Forest

Gradient Boosting

XGBoost

CatBoost

Hyperparameter tuning using GridSearchCV / RandomizedSearchCV.

Performance comparison of all models.

Model interpretability using feature importance.

Deployed via a Flask application (application.py) for user input and prediction.

Packaged with setup.py for installability.

Complete logging & exception handling.

🛠 Tech Stack

Programming Language: Python 3.9+

Libraries & Tools:

pandas, numpy, matplotlib, seaborn

scikit-learn

xgboost, catboost

Flask (for deployment)

joblib / pickle (for model persistence)

📂 Dataset

Dataset: Student Performance Dataset (UCI ML Repository)

Attributes include:

Demographics (age, gender, family background)

Academic records (study time, failures, absences, grades)

Social factors (internet access, family support, activities)

Target variable: Final Grade (G3) or pass/fail indicator.

🏗 Project Structure mlproject/ │ ├── data/ # (Optional) Store raw/processed data ├── notebook/ # Jupyter notebooks for EDA & experiments ├── src/ # Source code for ML pipeline │ ├── components/ # Data ingestion, transformation, trainer │ ├── pipeline/ # Training & prediction pipeline scripts │ ├── logger.py # Logging utility │ ├── exception.py # Custom exception handling │ └── utils.py # Helper functions │ ├── artifacts/ # Saved models, preprocessor, reports ├── application.py # Flask app entry point ├── requirements.txt # Dependencies ├── setup.py # Project install config ├── README.md # Project documentation └── LICENSE # License file

🔄 Machine Learning Pipeline

Data Ingestion

Load dataset

Handle missing values

Train-test split

Data Transformation

Encoding categorical variables

Scaling numerical features

Feature selection

Model Training

Train multiple algorithms

Hyperparameter tuning

Cross-validation

Model Evaluation

Accuracy, Precision, Recall, F1-score, ROC-AUC (classification)

RMSE, MAE, R² (regression)

Model Selection

Choose best-performing model based on validation results.

Deployment

Save model using pickle/joblib

Expose Flask API for predictions

📊** Results** Model Accuracy F1-score Notes Logistic Regression 78% 0.77 Baseline model Random Forest 85% 0.84 Good performance XGBoost 87% 0.86 Best model CatBoost 86% 0.85 Competitive

⚙️ Setup & Installation

Clone the repository:

git clone https://github.com/gator-ryan/mlproject.git cd mlproject

Create a virtual environment & activate:

python -m venv venv source venv/bin/activate # On Linux/Mac venv\Scripts\activate # On Windows

Install dependencies:

pip install -r requirements.txt

Install the project as a package:

pip install -e .

▶️ How to Run

Train model pipeline:

python src/pipeline/train_pipeline.py

Make predictions:

python src/pipeline/predict_pipeline.py

Run the web app (Flask):

python application.py

Visit → http://127.0.0.1:5000

🌐** Web Application**

Flask app accepts student details as input via web form.

Returns predicted student performance score / pass-fail outcome. (Can be extended with Streamlit for interactive dashboards.)

🚧** Future Improvements**

Add Docker support for containerized deployment.

Integrate CI/CD pipeline (GitHub Actions, AWS/GCP/Azure).

Add experiment tracking (MLflow, Weights & Biases).

Deploy on Heroku / AWS Elastic Beanstalk / Streamlit Cloud.

Improve feature engineering with domain-specific insights.

Add SHAP/LIME plots for interpretability.

🤝 Contributing

Contributions are welcome!

Fork the repository

Create your feature branch (git checkout -b feature/AmazingFeature)

Commit your changes (git commit -m 'Add some AmazingFeature')

Push to the branch (git push origin feature/AmazingFeature)

Open a Pull Request

📜** License**

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.ebextensions		.ebextensions
artifacts		artifacts
catboost_info		catboost_info
notebook		notebook
src		src
static		static
templates		templates
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
application.py		application.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

License

gator-ryan/Student-Performance-Prediction

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages