Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sophie Multivariate analysis #142

Open
wants to merge 63 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
d8a9e64
Update student.ipynb
clydeochieng Apr 27, 2024
edf85ba
Update student.ipynb
clydeochieng Apr 27, 2024
9a91482
Update README.md
clydeochieng Apr 27, 2024
44945a8
Update student.ipynb
clydeochieng Apr 27, 2024
cc69bf6
Update README.md
clydeochieng Apr 27, 2024
69d9b53
Update student.ipynb
clydeochieng Apr 27, 2024
47f9a23
Update student.ipynb
clydeochieng Apr 28, 2024
f790467
Update student.ipynb
clydeochieng Apr 28, 2024
d0245ea
Update student.ipynb
clydeochieng Apr 28, 2024
a6a8120
Update student.ipynb
clydeochieng Apr 28, 2024
5d27220
Update student.ipynb
clydeochieng Apr 29, 2024
5617545
Update student.ipynb
clydeochieng Apr 30, 2024
ce7d81d
Update student.ipynb
clydeochieng Apr 30, 2024
26261b7
Update student.ipynb
clydeochieng Apr 30, 2024
6c8c8b7
Update student.ipynb
clydeochieng Apr 30, 2024
5927602
Merge branch 'main' into Clyde
clydeochieng Apr 30, 2024
c70e37b
Merge pull request #3 from clydeochieng/Clyde
clydeochieng Apr 30, 2024
faab87e
Update student.ipynb
clydeochieng Apr 30, 2024
42bea00
Update student.ipynb
clydeochieng Apr 30, 2024
872b835
Update student.ipynb
clydeochieng Apr 30, 2024
d068fa1
Update student.ipynb
clydeochieng Apr 30, 2024
18077fa
Update student.ipynb
clydeochieng Apr 30, 2024
457ad7c
Update student.ipynb
clydeochieng Apr 30, 2024
e409f03
Update student.ipynb
clydeochieng Apr 30, 2024
9824144
Update student.ipynb
clydeochieng Apr 30, 2024
b05f711
Update student.ipynb
clydeochieng Apr 30, 2024
990054e
Update student.ipynb
clydeochieng May 1, 2024
57e844e
Update student.ipynb
clydeochieng May 1, 2024
455095c
Merge branch 'main' into Clyde
clydeochieng May 1, 2024
406ad55
Merge pull request #5 from clydeochieng/Clyde
clydeochieng May 1, 2024
aa1c14e
Update student.ipynb
clydeochieng May 1, 2024
cbb5de1
Update student.ipynb
clydeochieng May 1, 2024
9035926
Update student.ipynb
clydeochieng May 1, 2024
da626e5
Update student.ipynb
clydeochieng May 1, 2024
48e9997
Update student.ipynb
clydeochieng May 1, 2024
571eda3
Update student.ipynb
clydeochieng May 1, 2024
0edfa11
Update student.ipynb
clydeochieng May 1, 2024
9d60525
Update student.ipynb
clydeochieng May 1, 2024
14adaad
Update student.ipynb
clydeochieng May 1, 2024
4275e4e
Update student.ipynb
clydeochieng May 1, 2024
11616b1
Update student.ipynb
clydeochieng May 1, 2024
7d271f1
Update student.ipynb
clydeochieng May 1, 2024
100b237
Update student.ipynb
clydeochieng May 1, 2024
2353438
Update student.ipynb
clydeochieng May 1, 2024
1f4df67
Merge branch 'main' into Clyde
clydeochieng May 1, 2024
2886960
Merge pull request #6 from clydeochieng/Clyde
clydeochieng May 1, 2024
045ce61
Update student.ipynb
clydeochieng May 1, 2024
6565f7f
Update student.ipynb
clydeochieng May 1, 2024
89189c4
Update student.ipynb
clydeochieng May 1, 2024
637a010
Merge branch 'main' into Clyde
clydeochieng May 1, 2024
0225a1e
Merge pull request #7 from clydeochieng/Clyde
clydeochieng May 1, 2024
d3f256b
Update student.ipynb
clydeochieng May 1, 2024
192c53b
added libraries
clydeochieng May 1, 2024
febd3bf
added eda
Keter22 May 1, 2024
dde3c61
Merge pull request #8 from clydeochieng/Kiprotich
clydeochieng May 1, 2024
4a8e311
create functions for regression
May 1, 2024
3096e37
Identification of categorical variables
sophline May 1, 2024
2f65aa7
create model results
May 1, 2024
10d879d
correlation between target and numerical values
sophline May 1, 2024
514e95c
simple linear regression done
May 1, 2024
3b92b07
Merge pull request #9 from clydeochieng/Hilary
clydeochieng May 1, 2024
a9fd260
Create multilinear model
sophline May 1, 2024
6777827
model evaluation
sophline May 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update student.ipynb
clydeochieng committed May 1, 2024
commit aa1c14e5ff58f5b2840b8e60e2cf447ab8214c60
138 changes: 1 addition & 137 deletions student.ipynb
Original file line number Diff line number Diff line change
@@ -6,148 +6,13 @@
"source": [
"## Final Project Submission\n",
"\n",
Clyde

"* Student name: Solphine Joseph, Grace Rotich, Mathew Kiprotich, Hilary Simiyu, Clyde Ochieng, Derrick Kiptoo \n",
main
"* Student name: Solphine Joseph, Grace Rotich, Mathew Kiprotich, Hilary Simiyu, Clyde Ochieng, Derrick Kiptoo. \n",
"* Student pace: full time\n",
"* Scheduled project review date/time: \n",
"* Instructor name: Nikita \n",
"* Blog post URL:\n"
]
},
Clyde
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Kings County Housing Analysis with Multiple Linear Regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview\n",
"\n",
"A real estate agency in Kingsway seeks to determine what are the contributing factors that affect the price of houses to make improvements where necessary. They want to employ an analytical approach rather than sentimental before arriving at a decision. Multilinear regression has been used for this project to understand how various features affect their pricing to better their services."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Business Problem\n",
"\n",
"In the face of market fluctuations and heightened competition within the real estate sector, our agency is grappling with pricing volatility, which poses significant challenges for our agents in devising effective business strategies. We seek strategic guidance to optimize our purchasing and selling endeavors, prioritizing informed decision-making to identify key areas of focus that promise maximum returns on investment."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Objectives\n",
"* To determine the key factors influencing house prices.\n",
"* To develop multilinear regression models to predict house prices based on relevant features.\n",
"* To use insights from the regression analysis to optimize pricing strategies for both purchasing and selling properties.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Hypothesis\n",
"* Null Hypothesis - There is no relationship between our independent variables and our dependent variable \n",
"\n",
"* Alternative Hypothesis - There is a relationship between our independent variables and our dependent variable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data Understanding:\n",
"\n",
"In this project, we utilized the King County House Sales dataset, which serves as the foundational dataset for our analysis. It was sourced Kaggle.The dataset encompasses comprehensive information regarding house sales within King County, Washington, USA. It comprises a diverse array of features, including the number of bedrooms, bathrooms, square footage, as well as geographical and pricing details of the properties sold. This dataset is frequently employed in data science and machine learning endeavors, particularly for predictive modeling tasks such as regression analysis aimed at forecasting house prices based on the provided features."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### King County Housing Data Columns \n",
"\n",
"The column names contained in column_names.md are:\n",
"* `id`: A unique identifier for each house sale.\n",
"* `date`: The date when the house was sold.\n",
"* `price`: The sale price of the house, serving as the target variable for predictive modeling.\n",
"* `bedrooms`, `bathrooms`, `sqft_living`, `sqft_lot`: Numerical features representing the number of bedrooms and bathrooms, as well as the living area and lot area of the house, respectively.\n",
"* `floors`: The number of floors in the house.\n",
"* `waterfront`, `view`, `condition`, `grade`: Categorical features describing aspects such as waterfront availability, property view, condition, and overall grade assigned to the housing unit.\n",
"* `yr_built`, `yr_renovated`: Year of construction and renovation of the house.\n",
"* `zipcode`, `lat`, `long`: Geographical features including ZIP code, latitude, and longitude coordinates.\n",
"* `sqft_above`, `sqft_basement`, `sqft_living15`, `sqft_lot15`: Additional numerical features providing details about the house's above-ground and basement square footage, as well as living area and lot area of the nearest 15 neighboring houses."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Loading\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"import numpy as np\n",
"import pandas as pd\n",
"import scipy.stats as stats\n",
"import seaborn as sns\n",
"import statsmodels.api as sm\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Loading Data"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"# Creating a function that loads data and return it in a dataframe\n",
"def load_data(file_path):\n",
" house_data = pd.read_csv(file_path)\n",
"\n",
" #shape\n",
" shape = house_data.shape\n",
" print(f\"The dataset contains {shape[0]} houses with {shape[1]} features\")\n",
" print()\n",
" \n",
" #Data Types\n",
" data_types = house_data.dtypes\n",
" print(\"Columns and their data types:\")\n",
" for column, dtype in data_types.items():\n",
" print(f\"{column}: {dtype}\")\n",
" print()\n",
"\n",
" return house_data\n"
]
},
{
"cell_type": "code",
"execution_count": 18,

{
"cell_type": "markdown",
"metadata": {},
@@ -256,7 +121,6 @@ Clyde
{
"cell_type": "code",
"execution_count": 3,
main
"metadata": {},
"outputs": [
{