Skip to content

Latest commit

 

History

History
38 lines (31 loc) · 1.68 KB

File metadata and controls

38 lines (31 loc) · 1.68 KB

Homework

Form
We will use all the knowledge learned in this week. Please answer your questions via form above.
Deadline for the homework is 7th Feb 2022 17:00 CET.

Question 1:

What is count for fhv vehicles data for year 2019
Can load the data for cloud storage and run a count(*)

Question 2:

How many distinct dispatching_base_num we have in fhv for 2019
Can run a distinct query on the table from question 1

Question 3:

Best strategy to optimise if query always filter by dropoff_datetime and order by dispatching_base_num
Review partitioning and clustering video.
We need to think what will be the most optimal strategy to improve query performance and reduce cost.

Question 4:

What is the count, estimated and actual data processed for query which counts trip between 2019/01/01 and 2019/03/31 for dispatching_base_num B00987, B02060, B02279
Create a table with optimized clustering and partitioning, and run a count(*). Estimated data processed can be found in top right corner and actual data processed can be found after the query is executed.

Question 5:

What will be the best partitioning or clustering strategy when filtering on dispatching_base_num and SR_Flag
Review partitioning and clustering video. Clustering cannot be created on all data types.

Question 6:

What improvements can be seen by partitioning and clustering for data size less than 1 GB
Partitioning and clustering also creates extra metadata.
Before query execution this metadata needs to be processed.

(Not required) Question 7:

In which format does BigQuery save data
Review big query internals video.