Skip to content

Contains an analysis of key home sales metrics using SparkSQL and Python to manage large amounts of data.

Notifications You must be signed in to change notification settings

austinlmcconnell/home-sales-data-evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

-A Spark DataFrame is created from the dataset.

-A temporary table of the original DataFrame is created.

-A query is written that returns the average price, rounded to two decimal places, for a four-bedroom house that was sold in each year.

-A query is written that returns the average price, rounded to two decimal places, of a home that has three bedrooms and three bathrooms for each year the home was built.

-A query is written that returns the average price of a home with three bedrooms, three bathrooms, two floors, and is greater than or equal to 2,000 square feet for each year the home was built rounded to two decimal places.

-A query is written that returns the average price of a home per "view" rating having an average home price greater than or equal to $350,000, rounded to two decimal places. (The output shows the run time for this query.)

-A cache of the temporary "home_sales" table is created and validated.

-The query from step 6 is run on the cached temporary table, and the run time is computed.

-A partition of the home sales dataset by the "date_built" field is created, and the formatted parquet data is read.

-A temporary table of the parquet data is created.

-The query from step 6 is run on the parquet temporary table, and the run time is computed.

-The "home_sales" temporary table is uncached and verified.

About

Contains an analysis of key home sales metrics using SparkSQL and Python to manage large amounts of data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published