This repository includes two Jupyter notebooks:
prague-airbnb.ipynb
- data preprocessing, fitting a regression model, visualisation of feature importance viaSHAP
prague-airbnb-visualisation.ipynb
- plotting geographical data from the dataset viageoplot
📖 Associated Medium article explaining the results: How does location affect the price of Airbnb in Prague?
The source data was obtained from the Inside Airbnb project (http://insideairbnb.com/get-the-data.html). It includes information about:
- property - number of bedrooms, bathrooms, amenities, property type, text description etc
- hosts - number of other listings, superhost status, hosting experience
- property location - neighbourhood info, latitude and longitude
- availability
- summary of reviews
This data set was enriched with the transit time needed to get to the city centre, popular tourist attraction Old Town Square was chosen. All route times were retrieved via the Google Cloud Routes API (https://cloud.google.com/maps-platform/routes). These times are calculated for a weekday at 9:00 AM.
You can run these notebooks on your Jupyter notebook installation with following prerequisites:
In addition, these python libraries are used (install them with pip install <library_name>
):
geoplot
- library for plotting geospatial dataxgboost
- implementation of the XGBoost gradient boosting algorithmshap
- SHapley Additive exPlanations - library for Shapley value based explanations of machine learning models
All software and python packages are already in specified in the Dockerfile
, which is built on top of the official Jupyter Notebook SciPy docker image https://github.com/jupyter/docker-stacks.
After cloning the repository, simply run docker-compose up
and after building the image and running the container, you will be able to access the notebooks with a provided localhost
URL.