Predicting mental health trends with AI-driven models is a critical challenge in today's world. This project tackles that challenge using a large-scale dataset from the Kaggle Mental Health Competition, which contains 140,000+ records.
Through rigorous data preprocessing, feature selection, and deep learning techniques, I successfully developed an optimized predictive model that achieved:
- ✅ 95% accuracy on the training set
- ✅ 93.4% accuracy on the test set
Unlike conventional methods, I dedicated extensive hours to custom data cleaning by implementing self-developed algorithms, ensuring a higher quality dataset for training. Given the inherent inconsistencies in the raw dataset, my approach significantly improved model performance and reduced data bias.
- 140,000+ records with mental health-related features
- High noise levels & inconsistencies required deep cleaning
- Complex categorical & numerical features demanding transformation
⚡ Data cleaning was a major challenge, consuming a significant portion of the project timeline. Rather than using standard methods, I implemented custom algorithms to automate and refine:
- ✅ Missing value imputation with pattern-based logic
- ✅ Outlier detection & removal using statistical thresholds
- ✅ Standardization & normalization for better model convergence
- ✅ Encoding categorical variables dynamically to enhance feature usability
- 🛠 Frameworks & Tools:
- TensorFlow – Core deep learning framework
- Keras Tuner – Hyperparameter tuning for model optimization
- scikit-learn – Feature engineering, mutual_info_classif, train_test_split
- Pandas & NumPy – Custom-built data preprocessing algorithms
- Matplotlib & Seaborn – Visual analytics for understanding dataset distributions
- Kaggle API – Dataset handling & experimentation
- Conducted in-depth statistical analysis
- Built visualization reports using Seaborn & Matplotlib
- Identified patterns & correlations for feature engineering
- Extensively worked on data cleaning – Developed custom algorithms to handle missing values & inconsistencies
- Applied mutual_info_classif for feature selection
- Standardized data using scaling techniques
- 🔹 Built multiple versions of the model, testing various architectures
- 🔹 Used Keras Tuner for hyperparameter optimization
- 🔹 Addressed computational constraints due to Kaggle GPU limits
- ✅ Achieved 95% accuracy on training data
- ✅ Final model 93.4% accuracy on test data
- ✅ Fine-tuned loss functions & learning rates for better convergence
📌 This project underwent 8 iterations, with each version refining performance and addressing computational challenges.
🔗 Explore my Kaggle Notebook here: Mental Health Data Model 2 and here Mental Health Data Model 1
This project was not just about training a deep learning model—it was a battle against raw, unstructured, and inconsistent data. By investing extensive hours in cleaning and refining the dataset, I ensured that the models were trained on quality data, not noise.
🚀 This journey strengthened my expertise in data preprocessing, feature engineering, and deep learning, making it a significant milestone in my AI & Data Science career.