Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fct_vehicle_locations_grouped - calculate direction from the prior position #3771

Merged
merged 17 commits into from
Mar 19, 2025

Conversation

tiffanychu90
Copy link
Member

@tiffanychu90 tiffanychu90 commented Mar 12, 2025

Description

Change how vehicle positions are grouped within mart_gtfs.fct_vehicle_locations_grouped.

After using it in data-analyses, work here isn't behaving as expected, and we're losing rows we should have, and not grouping other rows that should be grouped.

Some noticeable differences are:

  • The desired vp_direction should be calculated from previous position. Solution: use lag to compare it to the prior and then group not-moving vehicle positions together.
  • fct_vehicle_locations_grouped was using current to next location and then using the keys to merge it back
    • doing it this way is unintuitive. if we want a comparison from the previous, let's just find the previous, rather than shifting the observations just to use key and next_location_key.
    • there are also some rows where next_location_key is missing, but when you sort based on location_timestamp, there is a next timestamp

Part of #3645

Related to #3660, #3672

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

jovyan@jupyter-tiffanychu90 ~/data-infra/warehouse (debug-vp-grouped) $ poetry run dbt run -s fct_vehicle_locations_grouped
17:12:32  Running with dbt=1.5.1
17:12:35  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.staging.ntd
17:12:36  Found 562 models, 1012 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 206 sources, 4 exposures, 0 metrics, 0 groups
17:12:36  
17:12:39  Concurrency: 8 threads (target='dev')
17:12:39  
17:12:39  1 of 1 START sql incremental model tiffany_mart_gtfs.fct_vehicle_locations_grouped  [RUN]
17:13:04  1 of 1 OK created sql incremental model tiffany_mart_gtfs.fct_vehicle_locations_grouped  [SCRIPT (62.3 GiB processed) in 25.47s]
17:13:04  
17:13:04  Finished running 1 incremental model in 0 hours 0 minutes and 28.10 seconds (28.10s).
17:13:04  
17:13:04  Completed successfully
17:13:04  
17:13:04  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

Tests:

jovyan@jupyter-tiffanychu90 ~/data-infra/warehouse (debug-vp-grouped) $ poetry run dbt test -s fct_vehicle_locations_grouped
18:20:53  Running with dbt=1.5.1
18:20:57  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.staging.ntd
18:20:58  Found 563 models, 1006 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 206 sources, 4 exposures, 0 metrics, 0 groups
18:20:58  
18:21:06  Concurrency: 8 threads (target='dev')
18:21:06  
18:21:06  1 of 1 START test not_null_fct_vehicle_locations_grouped_trip_instance_key ..... [RUN]
18:21:07  1 of 1 PASS not_null_fct_vehicle_locations_grouped_trip_instance_key ........... [PASS in 0.79s]
18:21:07  
18:21:07  Finished running 1 test in 0 hours 0 minutes and 9.55 seconds (9.55s).
18:21:07  
18:21:07  Completed successfully
18:21:07  
18:21:07  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

Post-merge follow-ups

  • No action required
  • Actions required - this is an incremental model...I think a full refresh needs to be done?

@tiffanychu90 tiffanychu90 marked this pull request as draft March 12, 2025 17:26
Copy link

github-actions bot commented Mar 12, 2025

Warehouse report 📦

Checks/potential follow-ups

Checks indicate the following action items may be necessary.

  • For modified incremental models (or incremental models whose parents are modified), does the PR description identify whether a full refresh is needed for these tables?

Changed incremental models 🔀

calitp_warehouse.mart.gtfs_quality.fct_daily_service_alerts_message_age_summary

calitp_warehouse.mart.gtfs_quality.fct_daily_trip_updates_message_age_summary

calitp_warehouse.mart.gtfs_quality.fct_daily_vehicle_positions_latency_statistics

calitp_warehouse.mart.gtfs_quality.fct_daily_vehicle_positions_message_age_summary

calitp_warehouse.mart.gtfs_quality.fct_daily_vendor_vehicle_positions_message_age_summary

calitp_warehouse.mart.gtfs_quality.fct_daily_with_trip_vehicle_positions_message_age_summary

calitp_warehouse.mart.gtfs.fct_service_alerts_messages_unnested

calitp_warehouse.mart.gtfs.fct_trip_updates_no_stop_times

calitp_warehouse.mart.gtfs.fct_vehicle_locations

calitp_warehouse.mart.gtfs.fct_vehicle_locations_grouped

calitp_warehouse.intermediate.gtfs.int_gtfs_rt__service_alerts_day_map_grouping

calitp_warehouse.intermediate.gtfs.int_gtfs_rt__service_alerts_trip_day_map_grouping

calitp_warehouse.intermediate.gtfs.int_gtfs_rt__trip_updates_trip_day_map_grouping

calitp_warehouse.intermediate.gtfs.int_gtfs_rt__vehicle_positions_trip_day_map_grouping

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

@tiffanychu90
Copy link
Member Author

@vevetron: Am i supposed to do uniqueness or 99.9% proportion tests on trip_instance_key if this table is not supposed to be grouped by trip?

@tiffanychu90
Copy link
Member Author

tiffanychu90 commented Mar 13, 2025

@vevetron: Am i supposed to do uniqueness or 99.9% proportion tests on trip_instance_key if this table is not supposed to be grouped by trip?

Within dim_stop_times, where a trip_id would be repeated...a not_null test is used but not the proportion. So I definitely think for fct_vehicle_locations, it didn't need that uniqueness test to begin with. For my downstream table from fct_vehicle_locations, I'll also just use a not_null test but remove the other tests and see how it goes.

      - name: trip_id
        description: '{{ doc("gtfs_stop_times__trip_id") }}'
        tests: *not_null_error

@tiffanychu90
Copy link
Member Author

@vevetron: I can't get the not null test to pass, but I did run a poetry run dbt run -s +fct_vehicle_locations_grouped first.

Steps I tried:

  • I materialized one of the rows that had nulls and started from cal-itp-data-infra.fct_vehicle_locations, materialized that table for that operator (SBMTD) and service date (somewhere in January?).
  • I ran this fct_vehicle_locations_grouped model on that operator / date and then queried it, and there were no nulls!

Thoughts for how to proceed and get rid of a failing test?

@vevetron
Copy link
Contributor

vevetron commented Mar 17, 2025

When I try to run this it all errors out. Maybe because I don't have any data to test with.

21:14:51 Compilation Error in model fct_vehicle_locations_grouped (models/mart/gtfs/fct_vehicle_locations_grouped.sql)
21:14:51 None has no element 0

Viveks-MacBook-Pro:warehouse vivek$ poetry run dbt run -s +fct_vehicle_locations_grouped
21:13:53  Running with dbt=1.5.10
21:13:54  Registered adapter: bigquery=1.5.1
21:13:54  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.staging.ntd
21:13:54  Found 560 models, 1021 tests, 0 snapshots, 0 analyses, 856 macros, 0 operations, 12 seed files, 206 sources, 4 exposures, 0 metrics, 0 groups
21:13:54
21:13:57  Concurrency: 8 threads (target='dev')
21:13:57
21:13:57  1 of 19 START sql view model vb_staging.stg_gtfs_rt__vehicle_positions ......... [RUN]
21:13:57  2 of 19 START sql view model vb_staging.stg_gtfs_schedule__agency .............. [RUN]
21:13:57  3 of 19 START sql view model vb_staging.stg_gtfs_schedule__download_outcomes ... [RUN]
21:13:57  4 of 19 START sql view model vb_staging.stg_gtfs_schedule__file_parse_outcomes . [RUN]
21:13:57  5 of 19 START sql view model vb_staging.stg_gtfs_schedule__unzip_outcomes ...... [RUN]
21:13:57  6 of 19 START sql view model vb_staging.stg_transit_database__gtfs_datasets .... [RUN]
21:13:58  5 of 19 OK created sql view model vb_staging.stg_gtfs_schedule__unzip_outcomes . [CREATE VIEW (0 processed) in 0.96s]
21:13:58  1 of 19 OK created sql view model vb_staging.stg_gtfs_rt__vehicle_positions .... [CREATE VIEW (0 processed) in 0.96s]
21:13:58  2 of 19 OK created sql view model vb_staging.stg_gtfs_schedule__agency ......... [CREATE VIEW (0 processed) in 0.98s]
21:13:58  6 of 19 OK created sql view model vb_staging.stg_transit_database__gtfs_datasets  [CREATE VIEW (0 processed) in 1.05s]
21:13:58  7 of 19 START sql table model vb_staging.int_transit_database__gtfs_datasets_dim  [RUN]
21:13:58  4 of 19 OK created sql view model vb_staging.stg_gtfs_schedule__file_parse_outcomes  [CREATE VIEW (0 processed) in 1.07s]
21:13:58  8 of 19 START sql view model vb_staging.int_gtfs_schedule__grouped_feed_file_parse_outcomes  [RUN]
21:13:58  3 of 19 OK created sql view model vb_staging.stg_gtfs_schedule__download_outcomes  [CREATE VIEW (0 processed) in 1.09s]
21:13:59  8 of 19 OK created sql view model vb_staging.int_gtfs_schedule__grouped_feed_file_parse_outcomes  [CREATE VIEW (0 processed) in 1.07s]
21:13:59  9 of 19 START sql view model vb_staging.int_gtfs_schedule__joined_feed_outcomes  [RUN]
21:14:01  9 of 19 OK created sql view model vb_staging.int_gtfs_schedule__joined_feed_outcomes  [CREATE VIEW (0 processed) in 1.27s]
21:14:01  10 of 19 START sql table model vb_mart_gtfs.dim_schedule_feeds ................. [RUN]
21:14:03  7 of 19 OK created sql table model vb_staging.int_transit_database__gtfs_datasets_dim  [CREATE TABLE (1.7k rows, 731.8 MiB processed) in 4.49s]
21:14:03  11 of 19 START sql table model vb_mart_transit_database.bridge_schedule_dataset_for_validation  [RUN]
21:14:03  12 of 19 START sql table model vb_mart_transit_database.dim_gtfs_datasets ...... [RUN]
21:14:05  11 of 19 OK created sql table model vb_mart_transit_database.bridge_schedule_dataset_for_validation  [CREATE TABLE (1.1k rows, 179.3 KiB processed) in 2.04s]
21:14:05  12 of 19 OK created sql table model vb_mart_transit_database.dim_gtfs_datasets . [CREATE TABLE (1.7k rows, 667.5 KiB processed) in 2.24s]
21:14:05  13 of 19 START sql table model vb_staging.int_transit_database__urls_to_gtfs_datasets  [RUN]
21:14:07  13 of 19 OK created sql table model vb_staging.int_transit_database__urls_to_gtfs_datasets  [CREATE TABLE (1.7k rows, 308.3 KiB processed) in 2.35s]
21:14:10  10 of 19 OK created sql table model vb_mart_gtfs.dim_schedule_feeds ............ [CREATE TABLE (1.6k rows, 522.3 MiB processed) in 9.38s]
21:14:10  14 of 19 START sql table model vb_mart_gtfs.fct_daily_schedule_feeds ........... [RUN]
21:14:13  14 of 19 OK created sql table model vb_mart_gtfs.fct_daily_schedule_feeds ...... [CREATE TABLE (170.5k rows, 504.2 KiB processed) in 3.36s]
21:14:13  15 of 19 START sql view model vb_mart_gtfs.fct_vehicle_positions_messages ...... [RUN]
21:14:14  15 of 19 OK created sql view model vb_mart_gtfs.fct_vehicle_positions_messages . [CREATE VIEW (0 processed) in 1.03s]
21:14:14  16 of 19 START sql incremental model vb_staging.int_gtfs_rt__vehicle_positions_trip_day_map_grouping  [RUN]
21:14:22  16 of 19 OK created sql incremental model vb_staging.int_gtfs_rt__vehicle_positions_trip_day_map_grouping  [SCRIPT (17.8 MiB processed) in 7.98s]
21:14:22  17 of 19 START sql table model vb_mart_gtfs.fct_vehicle_positions_trip_summaries  [RUN]
21:14:42  17 of 19 OK created sql table model vb_mart_gtfs.fct_vehicle_positions_trip_summaries  [CREATE TABLE (1.8m rows, 25.9 GiB processed) in 19.18s]
21:14:42  18 of 19 START sql incremental model vb_mart_gtfs.fct_vehicle_locations ........ [RUN]
21:14:51  18 of 19 OK created sql incremental model vb_mart_gtfs.fct_vehicle_locations ... [SCRIPT (0 processed) in 9.09s]
21:14:51  19 of 19 START sql incremental model vb_mart_gtfs.fct_vehicle_locations_grouped  [RUN]
21:14:51  19 of 19 ERROR creating sql incremental model vb_mart_gtfs.fct_vehicle_locations_grouped  [ERROR in 0.50s]
21:14:51
21:14:51  Finished running 9 view models, 7 table models, 3 incremental models in 0 hours 0 minutes and 56.96 seconds (56.96s).
21:14:51
21:14:51  Completed with 1 error and 0 warnings:
21:14:51
21:14:51  Compilation Error in model fct_vehicle_locations_grouped (models/mart/gtfs/fct_vehicle_locations_grouped.sql)
21:14:51    None has no element 0
21:14:51
21:14:51  Done. PASS=18 WARN=0 ERROR=1 SKIP=0 TOTAL=19

@tiffanychu90
Copy link
Member Author

@vevetron: I materialized a day / one operator in my tiffany_mart_gtfs and got the table to build / test successfully. Then I reset it back to incremental.

The tests that didn't pass have to do with not having a dt column to check whether there's unique gtfs_dataset_keys and whatnot, but I think if fct_vehicle_locations passes, then I want to keep using service_date rather than carrying more columns over to this table. The only test I'll leave is not null trip_instance_keys, because that is not a good thing!

@tiffanychu90 tiffanychu90 marked this pull request as ready for review March 18, 2025 23:02
@vevetron
Copy link
Contributor

I had two different sources of errors. I had your old fct_vehicle_locations_grouped already in my table. When running poetry run dbt run -s +fct_vehicle_locations_grouped this produced the None has no element 0 error. Deleting the old table fixed that.

Then when running poetry run dbt run -s fct_vehicle_locations_grouped this produces a

18:16:22
18:16:22  Completed with 1 error and 0 warnings:
18:16:22
18:16:22  Database Error in model fct_vehicle_locations_grouped (models/mart/gtfs/fct_vehicle_locations_grouped.sql)
18:16:22    Unrecognized name: dt at [3:17]

error. This is also mitigated by running poetry run dbt run -s fct_vehicle_locations_grouped --full-refresh instead.

This is seems to happen for all incremental materialization tables so it's not this pr's fault
poetry run dbt run (produces errors with every incremental model)

17:56:20
17:56:20  Finished running 223 view models, 282 table models, 48 incremental models in 0 hours 5 minutes and 11.67 seconds (311.67s).
17:56:20
17:56:20  Completed with 5 errors and 0 warnings:
17:56:20
17:56:20  Database Error in model dim_pathways (models/mart/gtfs/dim_pathways.sql)
17:56:20    Value of type FLOAT64 cannot be assigned to length, which has type NUMERIC at [91:404]
17:56:20    compiled Code at target/run/calitp_warehouse/models/mart/gtfs/dim_pathways.sql
17:56:20
17:56:20  Compilation Error in model fct_daily_vehicle_positions_latency_statistics (models/mart/gtfs_quality/fct_daily_vehicle_positions_latency_statistics.sql)
17:56:20    None has no element 0
17:56:20
17:56:20  Compilation Error in model fct_daily_vendor_vehicle_positions_message_age_summary (models/mart/gtfs_quality/fct_daily_vendor_vehicle_positions_message_age_summary.sql)
17:56:20    None has no element 0
17:56:20
17:56:20  Compilation Error in model fct_daily_with_trip_vehicle_positions_message_age_summary (models/mart/gtfs_quality/fct_daily_with_trip_vehicle_positions_message_age_summary.sql)
17:56:20    None has no element 0
17:56:20
17:56:20  Database Error in model fct_vehicle_locations_grouped (models/mart/gtfs/fct_vehicle_locations_grouped.sql)
17:56:20    Unrecognized name: dt at [3:17]
17:56:20
17:56:20  Done. PASS=547 WARN=0 ERROR=5 SKIP=1 TOTAL=553

Copy link
Contributor

@vevetron vevetron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessaryto materialize this table since this query probably won't happen a lot and we can use the materialized fct_vehicle_locations and calculate on the fly. But I'm not 100% sure. Still looks good! Please be sure to squash.

@tiffanychu90 tiffanychu90 merged commit 89b8335 into main Mar 19, 2025
4 checks passed
@tiffanychu90 tiffanychu90 deleted the debug-vp-grouped branch March 19, 2025 19:59
@vevetron
Copy link
Contributor

This ended up creating errors. I guess I didn't have a full grasp of the error messages I was seeing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants