A Data Analysis Project
Obs: The company and business problem are both fictitious, although the data is real.
The in-depth Python code explanation is available in this Jupyter Notebook.
House Rocket is a real estate company whose business model consists in identifying good deals, so that those properties could be bought for an interesting price and futurely sold for a higher price, therefore the company could turn in a profit. For this particular instance, House Rocket will operate in King County, which includes Seattle.
This Data Science project is focused on solving two problems:
- Problem 1: Which properties should House Rocket buy and for which suggested price?
- Problem 2: Once a property is bought, for which price should it be sold?
The data was collected from Kaggle. This dataset contains house sales prices for King County, from May 2014 to May 2015. The features descriptions are available below:
Feature | Definition |
---|---|
id | Unique ID for each home sold |
date | Date of the home sale |
price | Price of each home sold |
bedrooms | Number of bedrooms |
bathrooms | Number of bathrooms, where .5 accounts for a room with a toilet but no shower |
sqft_living | Square footage of the apartments interior living space |
sqft_lot | Square footage of the land space |
floors | Number of floors |
waterfront | A dummy variable for whether the apartment was overlooking the waterfront or not |
view | An index from 0 to 4 of how good the view of the property was |
condition | An index from 1 to 5 on the condition of the apartment |
grade | An index from 1 to 13, where 1-3 falls short of building construction and design, 7 has an average level of construction and design, and 11-13 have a high quality level of construction and design. |
sqft_above | The square footage of the interior housing space that is above ground level |
sqft_basement | The square footage of the interior housing space that is below ground level |
yr_built | The year the house was initially built |
yr_renovated | The year of the house’s last renovation |
zipcode | What zipcode area the house is in |
lat | Lattitude |
long | Longitude |
sqft_living15 | The square footage of interior housing living space for the nearest 15 neighbors |
sqft_lot15 | The square footage of the land lots of the nearest 15 neighbors |
- Location in real estate is undoubtedly a decisive factor in price evaluation.
- The address information for these properties was gathered by using the two created modules: address.py and defs.py.
- So that the business problems could be solved the ad_season feature was created. It's a column that shows on which season the property was announced in the real estate market (spring, summer, fall or winter).
To solve this problem, we firstly need to analyse the properties by their location, because in real estate, location is undoubtedly a decisive factor in price evaluation. One interesting metric in this case is to calculate the price median by each zipcode, as the median isn't influenced by extreme values (outliers) in the data.
The properties that will receive a buy suggestion will be the ones that fulfill the two following rules:
- Their asked price has to be lower than the median price for that region
- The property needs to be in good conditions, which means condition >= 3
As for the suggested price, the density by region will be taken into account, which means that for properties in regions that have a higher number of real estate ads it's viable to offer lower prices, and vice-versa. Therefore the rule for suggested buy prices is:
- From 0 to 204 properties in the region => Offer the asked price
- From 205 to 282 properties in the region => Offer 3% less than the asked price
- From 283 to 408 properties in the region => Offer 4% less than the asked price
- From 409 properties upwards => Offer 5% less than the asked price
To suggest a sell price, again the density by region will be considered, since for regions where the number of real estate ads is lower it's possible to sell the property for a higher price, and vice-versa. Hence, a reasonable rule for suggested sell prices is:
- From 0 to 204 properties in the region => Sell for 16% than the suggested buy price
- From 205 to 282 properties in the region => Sell for 14% than the suggested buy price
- From 283 to 408 properties in the region => Sell for 12% than the suggested buy price
- From 409 properties upwards => Sell for 10% than the suggested buy price
It's important to point out that selling a property for 10-16% more than the paid price is just a suggestion, so that the selling prices are realistic, since selling a property on a short run for say 30-40% more, although it can happen, it seems unlikely (or it would take too long to sell).
-
Solution to Problem 1:
- Buy Suggestion Table: Contains buy suggestions and suggested buy prices
-
Solution to Problem 2:
- Sell Suggestion Table: Contains suggested sell prices and profit
-
Financial Results:
- Profit descriptive analysis
- Average and median profit grouped by ad_season
- Average and median profit grouped by zipcode
- Average and median profit grouped by ad_season and zipcode
- House Rocket Cloud App: App deployed using Streamlit Cloud containing all tables (Buy Suggestion Table, Sell Suggestion Table and Financial Results Tables) with filters and a Buy Suggestion Map, as well as data insights.
Because the deployment was made in a free cloud (Streamlit Cloud) it could take a few minutes for the page to load in the first time.
- Python 3.9.12, Pandas, Matplotlib, Plotly and Geopandas.
- Jupyter Notebook and VSCode.
- Streamlit and Streamlit Cloud.
- Git and Github.
- Measures of Central Tendency and Dispersion.
- Exploratory Data Analysis (EDA).
Usage: House Rocket could focus on buying and selling waterfront view properties, since the profit will be higher in absolut values.
Usage: House Rocket could look to buy properties without a basement that have the potential to possess one. Therefore these properties can be sold for a lot higher price.
Usage: House Rocket could focus on buying and selling properties with good views, since the profit will be higher in absolut values.
Usage: House Rocket could focus on buying and selling properties around Lake Washington, since the profit will be higher in absolut values.
Usage: House Rocket would have higher profits buying and selling properties built from the mid 1980's upwards, as well as the ones built from 1900 to 1940.
Three interesting metrics to evaluate the financial performance for this solution is the profit mean and median (grouped by ad_season, zipcode and ad_season with zipcode), as well as the total profit. This in-depth information can be found in here. As for the profit for each property it can be checked in the House Rocket Cloud App, where filters can also be applied for better visualization.
If this feasible solution strategy used in this project were applied by House Rocket the total obtained profit would be US$ 473,094,328.48, with an average profit of US$ 45,337.26 per property. The main profit metrics are displayed below:
Metric | US$ |
---|---|
Total Profit | 473,094,328.48 |
Profit Mean | 45,337.26 |
Profit Median | 39,995.00 |
Min Profit | 8,217.50 |
Max Profit | 350,036.80 |
In this project the two main objectives were accomplished:
- A feasible solution was found for both business problems, leading to profitable results.
- Five interesting and useful insights were found through Exploratory Data Analysis (EDA).
We also managed to deliver tables with in-depth financial results, as well as buy and sell suggestion tables. All this information can be filtered by using the House Rocket Streamlit App, that also has the five business insights and a interactive Buy Suggestion Map.
Further on, this solution could be improved by using regression models to determine wheter a property has a good buying price, and for which price it could be bought and sold. Another interesting study would be to produce a market research, so that data about clients could be collected. Then, a clustering algorithm could be applied to identify what types of property features each group of customers would prefer.