Predicting student placement success using advanced machine learning algorithms
- π― Overview
- π Features
- π Dataset
- π οΈ Technologies
- π Models Implemented
- π Results
- β‘ Quick Start
- π Detailed Usage
- π§ Installation
- π Performance Metrics
- π¨ Project Structure
- π€ Contributing
- π License
- π Contact
This project implements a comprehensive machine learning solution to predict campus placement outcomes for students. By analyzing historical academic and personal data, our models identify key patterns that influence placement success, providing valuable insights for both students and educational institutions.
- π Predict Placement Probability: Determine likelihood of student placement
- π Identify Key Factors: Understand what drives placement success
- π― Provide Actionable Insights: Help students improve their placement chances
- π Support Institutional Decisions: Guide placement preparation strategies
- Multi-Algorithm Approach: Four different ML algorithms for comprehensive analysis
- Advanced Preprocessing: Robust data cleaning and feature engineering
- Hyperparameter Optimization: Automated tuning for optimal performance
- Comprehensive Evaluation: Multiple metrics for thorough model assessment
- Feature Importance Analysis: Understand which factors matter most
- Model Comparison: Side-by-side performance evaluation
- Scalable Architecture: Easy to extend with new algorithms
- Educational Focus: Designed for learning and research
Our dataset contains comprehensive student information across multiple dimensions:
| Category | Features | Description |
|---|---|---|
| Demographics | Gender | Student gender (Male/Female) |
| Academic Performance | S.S.C. Percentage, H.S.C. Percentage, Degree Percentage, MBA Percentage | Academic scores across education levels |
| Educational Background | Specialization, Degree Type | Academic stream and specialization |
| Professional Readiness | E-test Score, Work Experience | Employability assessment and experience |
| Outcome | Status, Salary | Placement result and compensation |
- Total Records: 215 students
- Features: 15 columns
- Target Variable: Placement Status (Placed/Not Placed)
- Data Quality: Clean dataset with minimal missing values
- Placement Rate: ~70% of students get placed
- Gender Distribution: Balanced representation
- Academic Correlation: Strong relationship between academic performance and placement
- Experience Impact: Work experience significantly improves placement chances
- Python 3.8+: Primary programming language
- Jupyter Notebook: Interactive development environment
- Pandas: Data manipulation and analysis
- NumPy: Numerical computing
- Scikit-learn: Machine learning algorithms and utilities
- Logistic Regression: Linear classification model
- Decision Trees: Tree-based classification
- Gradient Boosting: Ensemble learning method
- K-Nearest Neighbors: Instance-based learning
- Matplotlib: Basic plotting and visualization
- Seaborn: Statistical data visualization
- Plotly: Interactive visualizations (optional)
Best for: Baseline comparison and interpretability
- Algorithm Type: Linear classification
- Key Features: Probability outputs, feature importance
- Use Case: Understanding feature relationships
- Performance: Highest accuracy (~85%)
Best for: Feature importance and interpretability
- Algorithm Type: Tree-based classification
- Key Features: Non-linear relationships, interpretable rules
- Use Case: Understanding decision paths
- Performance: Good interpretability with moderate accuracy
Best for: High accuracy predictions
- Algorithm Type: Ensemble learning
- Key Features: Handles overfitting, robust performance
- Use Case: Production-ready predictions
- Performance: Second-best accuracy
Best for: Non-parametric classification
- Algorithm Type: Instance-based learning
- Key Features: No training required, adapts to data distribution
- Use Case: When data distribution is unknown
- Performance: Baseline comparison
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Logistic Regression | 85.12% | 0.87 | 0.85 | 0.86 |
| Gradient Boosting | 83.72% | 0.85 | 0.84 | 0.84 |
| Decision Tree | 83.72% | 0.84 | 0.84 | 0.84 |
| K-Nearest Neighbors | 81.40% | 0.82 | 0.81 | 0.81 |
- Logistic Regression achieves the highest accuracy
- Academic performance is the strongest predictor
- Work experience significantly improves placement chances
- Gender has minimal impact on placement outcomes
# Ensure Python 3.8+ is installed
python --version
# Install pip if not available
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py# Clone the repository
git clone https://github.com/yourusername/campus-placement-prediction.git
cd campus-placement-prediction
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt# Start Jupyter Notebook
jupyter notebook
# Open Logistic Regression.ipynb for best results
# Or run all models in sequence:
# 1. Logistic Regression.ipynb
# 2. Decision Tree.ipynb
# 3. Gradient Boosting.ipynb
# 4. K-Nearest Neighbor.ipynbpython -m venv campus_placement_env
campus_placement_env\Scripts\activate # Windows
source campus_placement_env/bin/activate # macOS/Linux
pip install -r requirements.txtconda create -n campus_placement python=3.9
conda activate campus_placement
pip install -r requirements.txt- Ensure
Placement.csvis in your project directory - Verify dataset structure matches expected format
- Run data validation checks
- Start with Logistic Regression: Best baseline performance
- Explore Decision Tree: Understand feature importance
- Test Gradient Boosting: Advanced ensemble method
- Compare with KNN: Non-parametric baseline
- Python 3.8 or higher
- 4GB RAM minimum (8GB recommended)
- 2GB free disk space
# Core dependencies
pandas>=1.5.0
numpy>=1.21.0
scikit-learn>=1.1.0
# Jupyter environment
jupyter>=1.0.0
ipykernel>=6.0.0
notebook>=6.4.0
# Optional: Visualization
matplotlib>=3.5.0
seaborn>=0.11.0# Test installation
python -c "import pandas, numpy, sklearn; print('All packages installed successfully!')"
# Start Jupyter
jupyter notebook- Accuracy: Overall prediction correctness
- Precision: True positive rate among predicted positives
- Recall: True positive rate among actual positives
- F1-Score: Harmonic mean of precision and recall
- Logistic Regression: Best overall performance
- Feature Importance: Academic scores > Work experience > Gender
- Data Quality: High-quality dataset with minimal preprocessing needed
- Scalability: Models can handle larger datasets efficiently
campus-placement-prediction/
βββ π notebooks/
β βββ π Logistic Regression.ipynb
β βββ π Decision Tree.ipynb
β βββ π Gradient Boosting.ipynb
β βββ π K-Nearest Neighbor.ipynb
βββ π data/
β βββ π Placement.csv
βββ π requirements.txt
βββ π Instructions.txt
βββ π README.md
βββ π LICENSE
- Logistic Regression.ipynb: Baseline model with highest accuracy
- Decision Tree.ipynb: Interpretable tree-based model
- Gradient Boosting.ipynb: Advanced ensemble method
- K-Nearest Neighbor.ipynb: Non-parametric approach
- requirements.txt: Python dependencies
- Instructions.txt: Detailed setup and usage guide
We welcome contributions! Please follow these steps:
# Fork the repository
git clone https://github.com/hariprabhu571/campus-placement-prediction.git
cd campus-placement-prediction
# Create a feature branch
git checkout -b feature/amazing-feature
# Make your changes
# Add tests if applicable
# Commit changes
git commit -m "Add amazing feature"
# Push to branch
git push origin feature/amazing-feature
# Create Pull Request- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- New Algorithms: Implement additional ML models
- Feature Engineering: Add new features or preprocessing steps
- Visualization: Enhance data visualization capabilities
- Documentation: Improve code comments and documentation
- Performance: Optimize existing algorithms
- Name: Hari Raja Prabhu P
- Email: [email protected]
- GitHub: @hariprabhu571







