ML-Codecamy-Final-Project

Codecademy's Machine Learning Career Path - Final Project (End-to-end ML Pipeline)

Updates

New Approach: Neural Networks

Implemented a Neural Network model to enhance the prediction capability using deep learning.
Trained a multi-layer perceptron (MLP) with ReLU activation functions and a linear output layer.
Achieved MSE Test: 0.0215 and R² Test: 0.9931, demonstrating competitive performance with the Random Forest model.
Conducted a comparison between Random Forest and Neural Networks, evaluating accuracy and computational efficiency.

Project Scope

Goals:

Predict the height of ocean waves using machine learning techniques.

Dataset:

Source: Global Ocean Waves Analysis and Forecast
Variable of Interest: Sea surface wave maximum height (VCMX) in meters.

Analysis:

Build an end-to-end ML pipeline to predict the height of waves based on oceanographic and meteorological features.

Pipeline Overview

Preprocessing:
- Features extracted from the dataset: Latitude, longitude, significant wave height, swell characteristics, wind wave characteristics, etc.
- Handled missing values and standardized the data.
Dimensionality Reduction:
- Applied PCA to reduce the number of features while retaining variability.
Model:
- Original Approach: Random Forest Regressor with hyperparameter tuning using GridSearchCV and validation folds.
- Improved Approach: Transitioned to RandomForestLearner from YDF (Google's TensorFlow Decision Forests) for better efficiency and compatibility with large datasets.

Updates from Today's Work

Key Improvements:

Optimized Hyperparameters:
- After extensive hyperparameter tuning, we determined the following best parameters for the Random Forest Learner:
```
{
    'num_trees': 50,
    'max_depth': 20,
    'min_examples': 2
}
```
Avoided Redundant Training:
- By leveraging these hyperparameters directly, we skipped retraining for less promising combinations, significantly reducing computation time.
Transition to YDF:
- Replaced scikit-learn's Random Forest implementation with ydf.RandomForestLearner for compatibility with large datasets and efficient tree-based modeling.
Improved Pipeline:
- Modified the ML pipeline to include PCA and scaling while adapting the training process to work seamlessly with YDF.

Results

Previous Model:
- Simple Linear Regression: ( R^2 = 0.6079 )
- Random Forest Regressor: ( R^2 = 0.9999 ) (scikit-learn implementation).
Current Model:
- RandomForestLearner: ( R^2 = 0.99996 ), achieving near-perfect predictions with reduced training time.

Visualizations:

Predicted vs. Actual Values:
- Visualized the relationship between predicted and actual wave heights, showing a strong correlation.
Feature Importance:
- The model identified significant wave height and wind wave characteristics as key predictors.

Future Work

Integrate GPU acceleration for larger datasets.
Experiment with other tree-based algorithms such as Gradient Boosted Trees (GBT) in YDF.
Automate hyperparameter tuning using Bayesian Optimization or similar techniques.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ML-Career-Path-Final-Project.ipynb		ML-Career-Path-Final-Project.ipynb
ML_Career_Path_Final_Project - RandomForestLearner_version.ipynb		ML_Career_Path_Final_Project - RandomForestLearner_version.ipynb
README.md		README.md
Waves_prediction_DNN.ipynb		Waves_prediction_DNN.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML-Codecamy-Final-Project

Updates

New Approach: Neural Networks

Project Scope

Goals:

Dataset:

Analysis:

Pipeline Overview

Updates from Today's Work

Key Improvements:

Results

Visualizations:

Future Work

About

Uh oh!

Releases

Packages

Languages

juanfcastropiccolo/ML-Codecamy-Final-Project

Folders and files

Latest commit

History

Repository files navigation

ML-Codecamy-Final-Project

Updates

New Approach: Neural Networks

Project Scope

Goals:

Dataset:

Analysis:

Pipeline Overview

Updates from Today's Work

Key Improvements:

Results

Visualizations:

Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages