GitHub - austinlmcconnell/home-sales-data-evaluation: Contains an analysis of key home sales metrics using SparkSQL and Python to manage large amounts of data.

-A Spark DataFrame is created from the dataset.

-A temporary table of the original DataFrame is created.

-A query is written that returns the average price, rounded to two decimal places, for a four-bedroom house that was sold in each year.

-A query is written that returns the average price, rounded to two decimal places, of a home that has three bedrooms and three bathrooms for each year the home was built.

-A query is written that returns the average price of a home with three bedrooms, three bathrooms, two floors, and is greater than or equal to 2,000 square feet for each year the home was built rounded to two decimal places.

-A query is written that returns the average price of a home per "view" rating having an average home price greater than or equal to $350,000, rounded to two decimal places. (The output shows the run time for this query.)

-A cache of the temporary "home_sales" table is created and validated.

-The query from step 6 is run on the cached temporary table, and the run time is computed.

-A partition of the home sales dataset by the "date_built" field is created, and the formatted parquet data is read.

-A temporary table of the parquet data is created.

-The query from step 6 is run on the parquet temporary table, and the run time is computed.

-The "home_sales" temporary table is uncached and verified.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Starter_Code-7		Starter_Code-7
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

austinlmcconnell/home-sales-data-evaluation

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages