Skip to content

Exploratory Data Analysis and Classification using custom Decision Trees class

License

Notifications You must be signed in to change notification settings

sri-ram-swaminathan/Spaceship-Titanic-and-Decision-Trees

Repository files navigation

Spaceship-Titanic-and-Decision-Trees

This project is a two-part exploration of the Spaceship Titanic dataset, a fictional dataset inspired by the historical Titanic disaster, available on Kaggle.

The documentation is a part of the jupyter notebook. A summary of the project is provided here. My blog post, with a concise & non-technical explanation can be found here.


Part 1: Exploratory Data Analysis (EDA)

  • Utilizes the Pandas library for data manipulation and cleaning.
  • Leverages Matplotlib and Seaborn for data visualization to uncover patterns and relationships within the dataset.
  • This initial analysis provides insights into the characteristics of passengers and the overall journey that might influence survival outcomes.

Part 2: Implementing a Decision Tree Classifier from Scratch

  • This section focuses on building the Decision Tree classification algorithm from scratch using only NumPy arrays.
  • Our decision tree is trained on the Spaceship Titanic dataset to predict passenger survival based on relevant features.
  • Our model is compared with sklearn's DecisionTreeRegressor and performs better by 0.05 accuracy points.

About

Exploratory Data Analysis and Classification using custom Decision Trees class

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published