Skip to content

hariprabhu571/Campus-Placement-Prediction

Repository files navigation

πŸŽ“ Campus Placement Prediction

Python Jupyter Scikit-learn License Status

Predicting student placement success using advanced machine learning algorithms

πŸ“‹ Table of Contents

🎯 Overview

This project implements a comprehensive machine learning solution to predict campus placement outcomes for students. By analyzing historical academic and personal data, our models identify key patterns that influence placement success, providing valuable insights for both students and educational institutions.

Key Objectives:

  • πŸ” Predict Placement Probability: Determine likelihood of student placement
  • πŸ“Š Identify Key Factors: Understand what drives placement success
  • 🎯 Provide Actionable Insights: Help students improve their placement chances
  • πŸ“ˆ Support Institutional Decisions: Guide placement preparation strategies

πŸ“Š Project Overview Dashboard

Project Overview

πŸš€ Features

✨ Core Features

  • Multi-Algorithm Approach: Four different ML algorithms for comprehensive analysis
  • Advanced Preprocessing: Robust data cleaning and feature engineering
  • Hyperparameter Optimization: Automated tuning for optimal performance
  • Comprehensive Evaluation: Multiple metrics for thorough model assessment

🎯 Advanced Capabilities

  • Feature Importance Analysis: Understand which factors matter most
  • Model Comparison: Side-by-side performance evaluation
  • Scalable Architecture: Easy to extend with new algorithms
  • Educational Focus: Designed for learning and research

πŸ“Š Dataset

πŸ“‹ Features Overview

Our dataset contains comprehensive student information across multiple dimensions:

Category Features Description
Demographics Gender Student gender (Male/Female)
Academic Performance S.S.C. Percentage, H.S.C. Percentage, Degree Percentage, MBA Percentage Academic scores across education levels
Educational Background Specialization, Degree Type Academic stream and specialization
Professional Readiness E-test Score, Work Experience Employability assessment and experience
Outcome Status, Salary Placement result and compensation

πŸ“ˆ Dataset Statistics

  • Total Records: 215 students
  • Features: 15 columns
  • Target Variable: Placement Status (Placed/Not Placed)
  • Data Quality: Clean dataset with minimal missing values

πŸ” Key Insights

  • Placement Rate: ~70% of students get placed
  • Gender Distribution: Balanced representation
  • Academic Correlation: Strong relationship between academic performance and placement
  • Experience Impact: Work experience significantly improves placement chances

πŸ“Š Data Distribution Analysis

Data Distribution

πŸ› οΈ Technologies

🐍 Core Technologies

  • Python 3.8+: Primary programming language
  • Jupyter Notebook: Interactive development environment
  • Pandas: Data manipulation and analysis
  • NumPy: Numerical computing

πŸ€– Machine Learning

  • Scikit-learn: Machine learning algorithms and utilities
  • Logistic Regression: Linear classification model
  • Decision Trees: Tree-based classification
  • Gradient Boosting: Ensemble learning method
  • K-Nearest Neighbors: Instance-based learning

πŸ“Š Visualization & Analysis

  • Matplotlib: Basic plotting and visualization
  • Seaborn: Statistical data visualization
  • Plotly: Interactive visualizations (optional)

πŸ“ˆ Models Implemented

🎯 1. Logistic Regression

Best for: Baseline comparison and interpretability

  • Algorithm Type: Linear classification
  • Key Features: Probability outputs, feature importance
  • Use Case: Understanding feature relationships
  • Performance: Highest accuracy (~85%)

🌳 2. Decision Tree

Best for: Feature importance and interpretability

  • Algorithm Type: Tree-based classification
  • Key Features: Non-linear relationships, interpretable rules
  • Use Case: Understanding decision paths
  • Performance: Good interpretability with moderate accuracy

πŸš€ 3. Gradient Boosting

Best for: High accuracy predictions

  • Algorithm Type: Ensemble learning
  • Key Features: Handles overfitting, robust performance
  • Use Case: Production-ready predictions
  • Performance: Second-best accuracy

πŸ“ 4. K-Nearest Neighbors

Best for: Non-parametric classification

  • Algorithm Type: Instance-based learning
  • Key Features: No training required, adapts to data distribution
  • Use Case: When data distribution is unknown
  • Performance: Baseline comparison

πŸ”„ Machine Learning Workflow

Model Workflow

πŸ† Results

πŸ“Š Model Performance Comparison

Model Comparison

Model Accuracy Precision Recall F1-Score
Logistic Regression 85.12% 0.87 0.85 0.86
Gradient Boosting 83.72% 0.85 0.84 0.84
Decision Tree 83.72% 0.84 0.84 0.84
K-Nearest Neighbors 81.40% 0.82 0.81 0.81

🎯 Key Findings

  • Logistic Regression achieves the highest accuracy
  • Academic performance is the strongest predictor
  • Work experience significantly improves placement chances
  • Gender has minimal impact on placement outcomes

πŸ“ˆ Feature Importance Analysis

Feature Importance

πŸ”— Feature Correlation Analysis

Correlation Matrix

πŸ“Š Data Distribution Insights

Data Distribution

πŸ”— Feature Correlations

Correlation Matrix

πŸ”„ Machine Learning Workflow

Model Workflow

πŸ“ˆ Model Performance Trends

Accuracy Trends

πŸ“‹ Detailed Model Analysis

Confusion Matrices

πŸ“Š Project Overview Dashboard

Project Overview

⚑ Quick Start

πŸš€ Prerequisites

# Ensure Python 3.8+ is installed
python --version

# Install pip if not available
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py

πŸ“¦ Installation

# Clone the repository
git clone https://github.com/yourusername/campus-placement-prediction.git
cd campus-placement-prediction

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

🎯 Quick Run

# Start Jupyter Notebook
jupyter notebook

# Open Logistic Regression.ipynb for best results
# Or run all models in sequence:
# 1. Logistic Regression.ipynb
# 2. Decision Tree.ipynb
# 3. Gradient Boosting.ipynb
# 4. K-Nearest Neighbor.ipynb

πŸ“– Detailed Usage

πŸ”§ Environment Setup

Option A: Virtual Environment (Recommended)

python -m venv campus_placement_env
campus_placement_env\Scripts\activate  # Windows
source campus_placement_env/bin/activate  # macOS/Linux
pip install -r requirements.txt

Option B: Conda Environment

conda create -n campus_placement python=3.9
conda activate campus_placement
pip install -r requirements.txt

πŸ“Š Data Preparation

  1. Ensure Placement.csv is in your project directory
  2. Verify dataset structure matches expected format
  3. Run data validation checks

🎯 Model Execution

  1. Start with Logistic Regression: Best baseline performance
  2. Explore Decision Tree: Understand feature importance
  3. Test Gradient Boosting: Advanced ensemble method
  4. Compare with KNN: Non-parametric baseline

πŸ”§ Installation

πŸ“‹ Requirements

  • Python 3.8 or higher
  • 4GB RAM minimum (8GB recommended)
  • 2GB free disk space

πŸ› οΈ Dependencies

# Core dependencies
pandas>=1.5.0
numpy>=1.21.0
scikit-learn>=1.1.0

# Jupyter environment
jupyter>=1.0.0
ipykernel>=6.0.0
notebook>=6.4.0

# Optional: Visualization
matplotlib>=3.5.0
seaborn>=0.11.0

πŸ” Verification

# Test installation
python -c "import pandas, numpy, sklearn; print('All packages installed successfully!')"

# Start Jupyter
jupyter notebook

πŸ“Š Performance Metrics

🎯 Evaluation Criteria

  • Accuracy: Overall prediction correctness
  • Precision: True positive rate among predicted positives
  • Recall: True positive rate among actual positives
  • F1-Score: Harmonic mean of precision and recall

πŸ“ˆ Model Insights

  • Logistic Regression: Best overall performance
  • Feature Importance: Academic scores > Work experience > Gender
  • Data Quality: High-quality dataset with minimal preprocessing needed
  • Scalability: Models can handle larger datasets efficiently

🎨 Project Structure

campus-placement-prediction/
β”œβ”€β”€ πŸ“ notebooks/
β”‚   β”œβ”€β”€ πŸ“„ Logistic Regression.ipynb
β”‚   β”œβ”€β”€ πŸ“„ Decision Tree.ipynb
β”‚   β”œβ”€β”€ πŸ“„ Gradient Boosting.ipynb
β”‚   └── πŸ“„ K-Nearest Neighbor.ipynb
β”œβ”€β”€ πŸ“ data/
β”‚   └── πŸ“„ Placement.csv
β”œβ”€β”€ πŸ“„ requirements.txt
β”œβ”€β”€ πŸ“„ Instructions.txt
β”œβ”€β”€ πŸ“„ README.md
└── πŸ“„ LICENSE

πŸ“‹ File Descriptions

  • Logistic Regression.ipynb: Baseline model with highest accuracy
  • Decision Tree.ipynb: Interpretable tree-based model
  • Gradient Boosting.ipynb: Advanced ensemble method
  • K-Nearest Neighbor.ipynb: Non-parametric approach
  • requirements.txt: Python dependencies
  • Instructions.txt: Detailed setup and usage guide

🀝 Contributing

We welcome contributions! Please follow these steps:

πŸ”§ Development Setup

# Fork the repository
git clone https://github.com/hariprabhu571/campus-placement-prediction.git
cd campus-placement-prediction

# Create a feature branch
git checkout -b feature/amazing-feature

# Make your changes
# Add tests if applicable

# Commit changes
git commit -m "Add amazing feature"

# Push to branch
git push origin feature/amazing-feature

# Create Pull Request

πŸ“ Contribution Guidelines

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

🎯 Areas for Contribution

  • New Algorithms: Implement additional ML models
  • Feature Engineering: Add new features or preprocessing steps
  • Visualization: Enhance data visualization capabilities
  • Documentation: Improve code comments and documentation
  • Performance: Optimize existing algorithms

πŸ“ž Contact

πŸ‘¨β€πŸ’» Project Maintainer


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published