A short, easy-to-follow project that trains a regression model to predict house prices using the (classic) Boston housing dataset. The primary work is in the colab notebook:
- Project_4_House_Price_Prediction.ipynb Table of contents
- About
- Notebook(s)
- Requirements
- Quick start (Colab and local)
- Reproducible run (commands/snippets)
- Notes about the dataset
- How the model is trained
- Evaluation & results
About This project demonstrates end-to-end steps for a supervised regression task:
- Loading a housing dataset
- Exploratory data analysis and visualization
- Train/test split
- Train an XGBoost regression model
- Evaluate model performance
Notebook(s)
- Project_4_House_Price_Prediction.ipynb — the primary notebook with all the code and visualizations.
Requirements Minimal Python packages (example versions that are known to work):
- python >= 3.8
- numpy
- pandas
- matplotlib
- seaborn
- scikit-learn
- xgboost
You can install the essentials with pip:
pip install numpy pandas matplotlib seaborn scikit-learn xgboost jupyterQuick start
Run in Google Colab (recommended if you don't want to configure locally)
- Open the notebook in Colab:
- The notebook includes a "Open in Colab" badge; or open: https://colab.research.google.com/github/meeks627/House_price_Prediction/blob/main/Project_4_House_Price_Prediction.ipynb
- Run the notebook cells in order.
Run locally
- Clone the repository:
git clone https://github.com/meeks627/House_price_Prediction.git
cd House_price_Prediction- Install dependencies (see Requirements).
- Start Jupyter and open the notebook:
jupyter notebook Project_4_House_Price_Prediction.ipynb- Run the cells top-to-bottom.
Reproducible run / key snippets
- The notebook uses sklearn.datasets.load_boston() to load the dataset:
import sklearn.datasets
house_price_dataset = sklearn.datasets.load_boston()Note: sklearn.datasets.load_boston is deprecated/removed in recent scikit-learn versions. If you encounter an error, either:
- Install a scikit-learn version that still includes load_boston (e.g., pip install scikit-learn==1.1.3), OR
- Use fetch_openml to retrieve the Boston dataset:
from sklearn.datasets import fetch_openml
boston = fetch_openml(name="boston", version=1, as_frame=True)
X = boston.data
y = boston.targetModel training (as implemented in the notebook)
- The notebook trains an XGBoost regressor:
from xgboost import XGBRegressor
model = XGBRegressor()
model.fit(X_train, Y_train)
preds = model.predict(X_test)- Evaluation metrics commonly shown in the notebook: MAE, MSE, RMSE, R^2 (scikit-learn metrics module).
Notes about the dataset
- The project uses the Boston housing dataset (13 features) and price as target.
- The dataset historically contains a capped value of 50.0 for some entries — check the notebook for handling and interpretation.
- The Boston dataset has been deprecated in scikit-learn due to ethical concerns; for production work consider using alternative datasets (e.g., California housing) or a custom dataset.
Evaluation & results
- The notebook includes visualizations, correlation heatmap, train/test split, model training and evaluation.