Kaggle Titanic Competition

This notebook is about prediting the mortality ie survival of passengers of the Titanic for Kaggle. It contains a quick but still complete data science life cycle from data cleaning, feature engineering to model selection and prediction. It comprises a train-validation-test split set-up and is based on pipes to prevent data leakage. The score kaggle score so far is 78,5 percent.

Data Cleaning

Here, only three variables contain missings. Cleaning is based on standard median inputing (for continuous) and most frequent (categorical). However, a regression based approach to imputation for age is explored too.

Feature engineering

Particularly, it tackles

the names and titles of passengers and
the number of relatives. It starts the exploration of cabin and fare.

Model Selection

First, logistic regression is explored and its hyperparameters are tweaked. Then, the same is done for Random Forest and finally XGboost.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
figures		figures
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
titanic.ipynb		titanic.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaggle Titanic Competition

Data Cleaning

Feature engineering

Model Selection

About

Releases

Packages

Languages

dullibri/titanic

Folders and files

Latest commit

History

Repository files navigation

Kaggle Titanic Competition

Data Cleaning

Feature engineering

Model Selection

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages