Explore the rental market and find out where to rent a house in Singapore using data and analytics!
An analytics that I do for myself to find out where I should rent a flat in Singapore and learn about the price drivers of the rental market.
In order not to increase server burden to the rental website or get myself into trouble, the scraping script as well as the original scraped data are excluded from this repo.
Script: data_cleaning_and_engineering.py
Data quality check, data cleaning, supplementary info joining and feature engineering are done within this script. Two datasets are generated from this scirpt:
- engineered_data.csv, used for later dashboard building and visualization purpose.
- model_data.csv, used for machine learning model building, all the features have been transformed into numerical or dummy-categorical values.
Script: modelling.py
To validate whether the features that we have and generate really make sense, I built machine learning models to check:
-
Prediction (explaining) Ability: Random forest model is built, out-of-sample MAE is roughly 3% of the average price, which means the model is able to give a very accurate estimate of a house rental price given the data.
-
Important Features: Feature importance from random forest model is leveraged to find out what are some of the most important features that will affect rental price in Singapore.
-
Quantified Effects: Lasso regression is built to check each feature's dollar effect on the rental price. For example, 1 unit increase in the housing area (sqft) will lead to 2.26 SGD increase on the monthly rental price.
Detail results can be found in SG_rental_analytics_2019_AUG.pptx.
Visualization is implemented to explore more logical insights into the market.
Check details in SG_rental_dashboard_2019_AUG.pbix.