Skip to content

This project aims at data cleaning techniques, visualization and building a best suitable neural network model for the data.

License

Notifications You must be signed in to change notification settings

AshwinAshok3/Mental_Health_kaggle_Competition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

<style> body { font-family: Arial, sans-serif; background-color: #f4f4f4; color: #333; padding: 20px; } h1, h2 { color: #2E86C1; } ul { list-style-type: none; padding-left: 0; } li { margin: 5px 0; } .emoji { font-size: 1.2em; } .highlight { color: #1F618D; font-weight: bold; } .note { font-style: italic; color: #566573; } a { color: #2980B9; text-decoration: none; } </style>

Details About the Project

🧠 Mental Health Data Prediction – Kaggle Competition 🎯

🚀 Introduction

Predicting mental health trends with AI-driven models is a critical challenge in today's world. This project tackles that challenge using a large-scale dataset from the Kaggle Mental Health Competition, which contains 140,000+ records.

Through rigorous data preprocessing, feature selection, and deep learning techniques, I successfully developed an optimized predictive model that achieved:

  • 95% accuracy on the training set
  • 93.4% accuracy on the test set

💡 What makes this project stand out?

Unlike conventional methods, I dedicated extensive hours to custom data cleaning by implementing self-developed algorithms, ensuring a higher quality dataset for training. Given the inherent inconsistencies in the raw dataset, my approach significantly improved model performance and reduced data bias.

📊 Dataset Overview

  • 140,000+ records with mental health-related features
  • High noise levels & inconsistencies required deep cleaning
  • Complex categorical & numerical features demanding transformation

Data cleaning was a major challenge, consuming a significant portion of the project timeline. Rather than using standard methods, I implemented custom algorithms to automate and refine:

  • ✅ Missing value imputation with pattern-based logic
  • ✅ Outlier detection & removal using statistical thresholds
  • ✅ Standardization & normalization for better model convergence
  • ✅ Encoding categorical variables dynamically to enhance feature usability

🔥 Tech Stack & Libraries Used

  • 🛠 Frameworks & Tools:
  • TensorFlow – Core deep learning framework
  • Keras Tuner – Hyperparameter tuning for model optimization
  • scikit-learn – Feature engineering, mutual_info_classif, train_test_split
  • Pandas & NumPy – Custom-built data preprocessing algorithms
  • Matplotlib & Seaborn – Visual analytics for understanding dataset distributions
  • Kaggle API – Dataset handling & experimentation

⚙️ Model Development & Optimization

🏗 Step 1: Exploratory Data Analysis (EDA)

  • Conducted in-depth statistical analysis
  • Built visualization reports using Seaborn & Matplotlib
  • Identified patterns & correlations for feature engineering

✨ Step 2: Data Cleaning & Feature Engineering

  • Extensively worked on data cleaning – Developed custom algorithms to handle missing values & inconsistencies
  • Applied mutual_info_classif for feature selection
  • Standardized data using scaling techniques

🏋️ Step 3: Model Training & Hyperparameter Tuning

  • 🔹 Built multiple versions of the model, testing various architectures
  • 🔹 Used Keras Tuner for hyperparameter optimization
  • 🔹 Addressed computational constraints due to Kaggle GPU limits

📈 Step 4: Performance Evaluation

  • ✅ Achieved 95% accuracy on training data
  • ✅ Final model 93.4% accuracy on test data
  • ✅ Fine-tuned loss functions & learning rates for better convergence

🏆 Project Iterations & Notebook Versions

📌 This project underwent 8 iterations, with each version refining performance and addressing computational challenges.

🔗 Explore my Kaggle Notebook here: Mental Health Data Model 2 and here Mental Health Data Model 1

🎯 Final Thoughts

This project was not just about training a deep learning model—it was a battle against raw, unstructured, and inconsistent data. By investing extensive hours in cleaning and refining the dataset, I ensured that the models were trained on quality data, not noise.

🚀 This journey strengthened my expertise in data preprocessing, feature engineering, and deep learning, making it a significant milestone in my AI & Data Science career.

🔍 If you're interested in collaborating or discussing the project, feel free to reach out!

About

This project aims at data cleaning techniques, visualization and building a best suitable neural network model for the data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages