This project is a two-part exploration of the Spaceship Titanic dataset, a fictional dataset inspired by the historical Titanic disaster, available on Kaggle.
The documentation is a part of the jupyter notebook. A summary of the project is provided here. My blog post, with a concise & non-technical explanation can be found here.
- Utilizes the Pandas library for data manipulation and cleaning.
- Leverages Matplotlib and Seaborn for data visualization to uncover patterns and relationships within the dataset.
- This initial analysis provides insights into the characteristics of passengers and the overall journey that might influence survival outcomes.
- This section focuses on building the Decision Tree classification algorithm from scratch using only NumPy arrays.
- Our decision tree is trained on the Spaceship Titanic dataset to predict passenger survival based on relevant features.
- Our model is compared with sklearn's
DecisionTreeRegressor
and performs better by 0.05 accuracy points.