Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve transit database documentation #3774

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 45 additions & 20 deletions docs/transit_database/transitdatabase.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,51 @@ Cal-ITP uses two main Airtable bases:
| [**California Transit**](#california-transit) | Defines key organizational relationships and properties. Organizations, geography, funding programs, transit services, service characteristics, transit datasets such as GTFS, and the intersection between transit datasets and services. |
| [**Transit Technology Stacks**](#transit-technology-stacks) | Defines operational setups at transit provider organizations. Defines relationships between vendor organizations, transit provider and operator organizations, products, contracts to provide products, transit stack components, and how they relate to one-another. |

## Entity Relationship Diagrams

The following entity relationship diagrams were last updated in 2022 but are preserved for general reference purposes.

(california-transit)=

### California Transit

[![](https://mermaid.ink/img/pako:eNqVk8tuwjAQRX_F8hrUfXZVaVlVRNBlNkM8SSzFD8Y2Ukr499ohaUMBqc0u1j13Htc-8dII5BlHWkmoCVShWfw2VIOWn-Cl0ezcL5fmxHZIR1lixgquQEONruB_UhuLBP6R3Azyta_cCjw49AmxZEQo_4eEfStdc2EeU_OzhEnHLBBqz8xcbaq7pfto0t-avOebp-1H_swqQ8w3yKIjpJ_J5IGV6dmLCdp3azRx-bbpxpYaBHEIQB4JBZN68hm3yvp-6OMOvBmXPYN-qSY4b6HEK1agK0nuI1wOyLLFI7bRKU6iLl3fzPQWtJC6zin5qCH9_ir9Kgq-c9xKgbTFQ5CEKi19TGWmB2tbGVvw5ifI-dhj8uNRugCJIqxaLH1c1r6bis0uyNTXHVKmaZUl6SKcYucLrjDOK0V8F6dkVfCYqcKCX5ZUQWh9KnKO0mBFXPerkN4QzypoHS44BG92nS555ingJBpf2Kg6fwGQYjVu)](https://mermaid-js.github.io/mermaid-live-editor/edit/#pako:eNqVk8tuwjAQRX_F8hrUfXZVaVlVRNBlNkM8SSzFD8Y2Ukr499ohaUMBqc0u1j13Htc-8dII5BlHWkmoCVShWfw2VIOWn-Cl0ezcL5fmxHZIR1lixgquQEONruB_UhuLBP6R3Azyta_cCjw49AmxZEQo_4eEfStdc2EeU_OzhEnHLBBqz8xcbaq7pfto0t-avOebp-1H_swqQ8w3yKIjpJ_J5IGV6dmLCdp3azRx-bbpxpYaBHEIQB4JBZN68hm3yvp-6OMOvBmXPYN-qSY4b6HEK1agK0nuI1wOyLLFI7bRKU6iLl3fzPQWtJC6zin5qCH9_ir9Kgq-c9xKgbTFQ5CEKi19TGWmB2tbGVvw5ifI-dhj8uNRugCJIqxaLH1c1r6bis0uyNTXHVKmaZUl6SKcYucLrjDOK0V8F6dkVfCYqcKCX5ZUQWh9KnKO0mBFXPerkN4QzypoHS44BG92nS555ingJBpf2Kg6fwGQYjVu)

[editable source](https://mermaid-js.github.io/mermaid-live-editor/edit/#pako:eNqVk8tuwjAQRX_F8hrUfXZVaVlVRNBlNkM8SSzFD8Y2Ukr499ohaUMBqc0u1j13Htc-8dII5BlHWkmoCVShWfw2VIOWn-Cl0ezcL5fmxHZIR1lixgquQEONruB_UhuLBP6R3Azyta_cCjw49AmxZEQo_4eEfStdc2EeU_OzhEnHLBBqz8xcbaq7pfto0t-avOebp-1H_swqQ8w3yKIjpJ_J5IGV6dmLCdp3azRx-bbpxpYaBHEIQB4JBZN68hm3yvp-6OMOvBmXPYN-qSY4b6HEK1agK0nuI1wOyLLFI7bRKU6iLl3fzPQWtJC6zin5qCH9_ir9Kgq-c9xKgbTFQ5CEKi19TGWmB2tbGVvw5ifI-dhj8uNRugCJIqxaLH1c1r6bis0uyNTXHVKmaZUl6SKcYucLrjDOK0V8F6dkVfCYqcKCX5ZUQWh9KnKO0mBFXPerkN4QzypoHS44BG92nS555ingJBpf2Kg6fwGQYjVu)

The California Transit Airtable Base Contains data about organizations, the services they operate and how those services are represented with GTFS Data.

#### Organizations

Organizations represent legal entities such as companies, universities, government entities or non-profits. Within these records there may be transit agencies or vendors that serve transit agencies. Some of these entities may "manage" several transit services.

#### Services

Services represent the "transportation product" that transit agencies offer to customers. Services are differenitiated in the characteristics of service they offer. Each service has a "Provider" which is the organization that is responsible for funding the service and planning how the service will operate. Where data has been collected, some services also have "Operators" which identify whether the providing organization directly operates the service with their own staff or if they have a contractor provide the service for them.

Frequently, organizations provide multiple services with different characteristics. The California Transit base differentiates between these services if there is a difference in any of the following characteristics:

1. Provider of the service
2. "Super-mode" of the service
3. Whether the service is a "fixed-route" service
4. Whether there are different "rider requirements" that passengers must qualify for in order to ride

#### GTFS Datasets

This table contains records of GTFS URLs where GTFS data can be downloaded from. Each of these records identifies the type of GTFS data format and for realtime datasets also indicates which GTFS Schedule feed it is associated with for matching and validation purposes.

#### GTFS Service Data

This table describes the linkage between Services and GTFS Datasets. Each record is restricted to have exactly one service and one GTFS dataset record. There is also a place to indicate whether the GTFS Dataset in the record is assumed to be "public facing" for the associated service. "Public facing" indicates whether the associated GTFS Dataset is assumed to be what trip planners use to represent the associated service when integrated the data into their systems to show the general public.

(transit-technology-stacks)=

### Transit Stacks

[![](https://mermaid.ink/img/pako:eNqdk7tuwzAMRX9F0JzH7jXp0ClF09ELITGyAFs0KClFG-ffS7_6SNu0iEbp3MtLSjppQxZ1oZG3HhxDUwYla8cOgn-F5ClEde6WSzqpPfLRGyxUqRsI4DCW-kecBnxDITGY1PP0HP4PV1Tb6_QDk80jHLGuB3jEp4wbaloKGJIoVqvuS3YfVY5oVSLV1hDWYy9rapEh4Vz3N6NPpcUoSlQF8bqoU-8bk0wa9cH9JbwYi-hapqO3ffaKKbvqo-9HrMcRVb79ZtZ1Q4rL_d50x975AEOcOJ4rMwNzulvNn4Adpht9pwlsIcHeVNhA73gfxSUENEmGkKOkLrVe6Aa5AW_lHZ9651InEchV9hKLB8j1UPMsaG6t3PKd9YlYFweoIy405ET7l2B0kTjjDE0_YqLOb7JEHuQ)](https://mermaid-js.github.io/mermaid-live-editor/edit/#pako:eNqdk7tuwzAMRX9F0JzH7jXp0ClF09ELITGyAFs0KClFG-ffS7_6SNu0iEbp3MtLSjppQxZ1oZG3HhxDUwYla8cOgn-F5ClEde6WSzqpPfLRGyxUqRsI4DCW-kecBnxDITGY1PP0HP4PV1Tb6_QDk80jHLGuB3jEp4wbaloKGJIoVqvuS3YfVY5oVSLV1hDWYy9rapEh4Vz3N6NPpcUoSlQF8bqoU-8bk0wa9cH9JbwYi-hapqO3ffaKKbvqo-9HrMcRVb79ZtZ1Q4rL_d50x975AEOcOJ4rMwNzulvNn4Adpht9pwlsIcHeVNhA73gfxSUENEmGkKOkLrVe6Aa5AW_lHZ9651InEchV9hKLB8j1UPMsaG6t3PKd9YlYFweoIy405ET7l2B0kTjjDE0_YqLOb7JEHuQ)

[editable source](https://mermaid-js.github.io/mermaid-live-editor/edit/#pako:eNqdk7tuwzAMRX9F0JzH7jXp0ClF09ELITGyAFs0KClFG-ffS7_6SNu0iEbp3MtLSjppQxZ1oZG3HhxDUwYla8cOgn-F5ClEde6WSzqpPfLRGyxUqRsI4DCW-kecBnxDITGY1PP0HP4PV1Tb6_QDk80jHLGuB3jEp4wbaloKGJIoVqvuS3YfVY5oVSLV1hDWYy9rapEh4Vz3N6NPpcUoSlQF8bqoU-8bk0wa9cH9JbwYi-hapqO3ffaKKbvqo-9HrMcRVb79ZtZ1Q4rL_d50x975AEOcOJ4rMwNzulvNn4Adpht9pwlsIcHeVNhA73gfxSUENEmGkKOkLrVe6Aa5AW_lHZ9651InEchV9hKLB8j1UPMsaG6t3PKd9YlYFweoIy405ET7l2B0kTjjDE0_YqLOb7JEHuQ)

The rest of this page outlines stray technical considerations associated with Airtable and its ingestion into the data warehouse.

## Primary Keys
Expand Down Expand Up @@ -62,26 +107,6 @@ Airtable allows you to "sync" a table from one base to another, where it appears

This requires special handling when importing to the warehouse, because Airtable assigns new back-end record IDs in the synced table, which means that foreign keys to the synced table in the second base will not match record IDs in the source table. We resolve this by mapping all foreign keys to point to the source table in a base layer in dbt. See [data infra PR #2781](https://github.com/cal-itp/data-infra/pull/2781) for an example.

## Entity Relationship Diagrams

The following entity relationship diagrams were last updated in 2022 but are preserved for general reference purposes.

(california-transit)=

### California Transit

[![](https://mermaid.ink/img/pako:eNqVVEtv4jAQ_iuWz0W9c1stbbWHbhFw5DLEEzJax07HDqss4b_vOCQQXlLLBSX6XuP5nL3OvEE91cgzgi1DuXZKfh-8BUf_IJJ36tBOJn6vlsg7ynCq1roEB1sMa_0ltK-QIT6C-w7-FvMwgwgBY6Jk3oW6_BalYm_q7HuUemMpFEfOY9b4XaJRUBUwuqj8GO3zu9btQxEwJTkKEZnc9sta7a3W-_zjebGa_1C5ZxULVJIO0sN3RCRQolYWnEt5oI6FZ4rNQ9VHw7bqp69dbN7QS6OqounlCwTzWQPLwGgUuUHnCq3atgs4t5DhhYbBkDFtMKiso0ws7tCKkoQqjwFG8V4l77KR4y0HxeuRoaosiVr05wL0vQ3D8kc9Pu4dIoMLFAer3qx2Rk5tzilu2V2C9oKcC-DUzQUZ5AV-1sRYpiLdml1mS6QXS1uSwsbm5HLDsvLvgtCwA5Pt93czX3ck_Y3oX6WLkTQYc4tZlBVtmsF7dHGG2e4wKS2mrJiCkM8VXkH4c_c8Bep3GJ7_ehaAPxXiFdG8Y2TKwtCoq5t7bkIqJqOVne5QiaLnCE7mO9v_Xs1-SUMGpXEJQtIqvDXinueUEdgEV0acujz6SZco3SIj38h90ltrcSxxrY8xcqhtTE4HgdaVEPHFUPSsp5FrfNJyjfyycdnwfMT0H1s9zcEGPPwHJNjt_A)](https://mermaid-js.github.io/mermaid-live-editor/edit/#pako:eNqVVEtv4jAQ_iuWz0W9c1stbbWHbhFw5DLEEzJax07HDqss4b_vOCQQXlLLBSX6XuP5nL3OvEE91cgzgi1DuXZKfh-8BUf_IJJ36tBOJn6vlsg7ynCq1roEB1sMa_0ltK-QIT6C-w7-FvMwgwgBY6Jk3oW6_BalYm_q7HuUemMpFEfOY9b4XaJRUBUwuqj8GO3zu9btQxEwJTkKEZnc9sta7a3W-_zjebGa_1C5ZxULVJIO0sN3RCRQolYWnEt5oI6FZ4rNQ9VHw7bqp69dbN7QS6OqounlCwTzWQPLwGgUuUHnCq3atgs4t5DhhYbBkDFtMKiso0ws7tCKkoQqjwFG8V4l77KR4y0HxeuRoaosiVr05wL0vQ3D8kc9Pu4dIoMLFAer3qx2Rk5tzilu2V2C9oKcC-DUzQUZ5AV-1sRYpiLdml1mS6QXS1uSwsbm5HLDsvLvgtCwA5Pt93czX3ck_Y3oX6WLkTQYc4tZlBVtmsF7dHGG2e4wKS2mrJiCkM8VXkH4c_c8Bep3GJ7_ehaAPxXiFdG8Y2TKwtCoq5t7bkIqJqOVne5QiaLnCE7mO9v_Xs1-SUMGpXEJQtIqvDXinueUEdgEV0acujz6SZco3SIj38h90ltrcSxxrY8xcqhtTE4HgdaVEPHFUPSsp5FrfNJyjfyycdnwfMT0H1s9zcEGPPwHJNjt_A)

[editable source](https://mermaid-js.github.io/mermaid-live-editor/edit/#pako:eNqVVEtv4jAQ_iuWz0W9c1stbbWHbhFw5DLEEzJax07HDqss4b_vOCQQXlLLBSX6XuP5nL3OvEE91cgzgi1DuXZKfh-8BUf_IJJ36tBOJn6vlsg7ynCq1roEB1sMa_0ltK-QIT6C-w7-FvMwgwgBY6Jk3oW6_BalYm_q7HuUemMpFEfOY9b4XaJRUBUwuqj8GO3zu9btQxEwJTkKEZnc9sta7a3W-_zjebGa_1C5ZxULVJIO0sN3RCRQolYWnEt5oI6FZ4rNQ9VHw7bqp69dbN7QS6OqounlCwTzWQPLwGgUuUHnCq3atgs4t5DhhYbBkDFtMKiso0ws7tCKkoQqjwFG8V4l77KR4y0HxeuRoaosiVr05wL0vQ3D8kc9Pu4dIoMLFAer3qx2Rk5tzilu2V2C9oKcC-DUzQUZ5AV-1sRYpiLdml1mS6QXS1uSwsbm5HLDsvLvgtCwA5Pt93czX3ck_Y3oX6WLkTQYc4tZlBVtmsF7dHGG2e4wKS2mrJiCkM8VXkH4c_c8Bep3GJ7_ehaAPxXiFdG8Y2TKwtCoq5t7bkIqJqOVne5QiaLnCE7mO9v_Xs1-SUMGpXEJQtIqvDXinueUEdgEV0acujz6SZco3SIj38h90ltrcSxxrY8xcqhtTE4HgdaVEPHFUPSsp5FrfNJyjfyycdnwfMT0H1s9zcEGPPwHJNjt_A)

(transit-technology-stacks)=

### Transit Stacks

[![](https://mermaid.ink/img/pako:eNqdk7tuwzAMRX9F0JzH7jXp0ClF09ELITGyAFs0KClFG-ffS7_6SNu0iEbp3MtLSjppQxZ1oZG3HhxDUwYla8cOgn-F5ClEde6WSzqpPfLRGyxUqRsI4DCW-kecBnxDITGY1PP0HP4PV1Tb6_QDk80jHLGuB3jEp4wbaloKGJIoVqvuS3YfVY5oVSLV1hDWYy9rapEh4Vz3N6NPpcUoSlQF8bqoU-8bk0wa9cH9JbwYi-hapqO3ffaKKbvqo-9HrMcRVb79ZtZ1Q4rL_d50x975AEOcOJ4rMwNzulvNn4Adpht9pwlsIcHeVNhA73gfxSUENEmGkKOkLrVe6Aa5AW_lHZ9651InEchV9hKLB8j1UPMsaG6t3PKd9YlYFweoIy405ET7l2B0kTjjDE0_YqLOb7JEHuQ)](https://mermaid-js.github.io/mermaid-live-editor/edit/#pako:eNqdk7tuwzAMRX9F0JzH7jXp0ClF09ELITGyAFs0KClFG-ffS7_6SNu0iEbp3MtLSjppQxZ1oZG3HhxDUwYla8cOgn-F5ClEde6WSzqpPfLRGyxUqRsI4DCW-kecBnxDITGY1PP0HP4PV1Tb6_QDk80jHLGuB3jEp4wbaloKGJIoVqvuS3YfVY5oVSLV1hDWYy9rapEh4Vz3N6NPpcUoSlQF8bqoU-8bk0wa9cH9JbwYi-hapqO3ffaKKbvqo-9HrMcRVb79ZtZ1Q4rL_d50x975AEOcOJ4rMwNzulvNn4Adpht9pwlsIcHeVNhA73gfxSUENEmGkKOkLrVe6Aa5AW_lHZ9651InEchV9hKLB8j1UPMsaG6t3PKd9YlYFweoIy405ET7l2B0kTjjDE0_YqLOb7JEHuQ)

[editable source](https://mermaid-js.github.io/mermaid-live-editor/edit/#pako:eNqdk7tuwzAMRX9F0JzH7jXp0ClF09ELITGyAFs0KClFG-ffS7_6SNu0iEbp3MtLSjppQxZ1oZG3HhxDUwYla8cOgn-F5ClEde6WSzqpPfLRGyxUqRsI4DCW-kecBnxDITGY1PP0HP4PV1Tb6_QDk80jHLGuB3jEp4wbaloKGJIoVqvuS3YfVY5oVSLV1hDWYy9rapEh4Vz3N6NPpcUoSlQF8bqoU-8bk0wa9cH9JbwYi-hapqO3ffaKKbvqo-9HrMcRVb79ZtZ1Q4rL_d50x975AEOcOJ4rMwNzulvNn4Adpht9pwlsIcHeVNhA73gfxSUENEmGkKOkLrVe6Aa5AW_lHZ9651InEchV9hKLB8j1UPMsaG6t3PKd9YlYFweoIy405ET7l2B0kTjjDE0_YqLOb7JEHuQ)

## DAGs Maintenance

You can find further information on how to maintain the DAGs for Transit Database data [on this page](dags-maintenance), which covers general Airflow maintenance and troubleshooting patterns.
6 changes: 5 additions & 1 deletion docs/warehouse/what_is_agency.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,12 @@ Inconsistent use of the term `agency` can be confusing, so this section of the d

| <span style="white-space: nowrap;">Area of Focus</span> | How to Identify an `agency` |
| ------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Organizations** | Within the [California Transit Base](california-transit), organizations can represent transit agencies or other organizations that don't provide transit service. Some of these entities may "manage" several transit services. Some of these organizations have a built-in crosswalk to NTD data via the `ntd_id_2022` column. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of an entity that manages several transit services and an example of an organization that doesn't provide transit service?

| **Services** | Within the [California Transit Base](california-transit), services represent a type of transit "product" that people can use to travel. Services are differentiated if there is a difference in any of the following characteristics: <ol><li>Provider of the service</li><li>"Super-mode" of the service.</li><li>Whether the service is a "fixed-route" service.</li><li>Whether there are different "rider requirements" that passengers must qualify for in order to ride.</li></ol> Often times, organizations (especially large transit agencies) will "manage" multiple services. <br/><br/>**Examples of `services` for OCTA:**<br/><ul><li>`Orange County Transportation Authority`: fixed-route bus service</li><li>`OC Streetcar`: fixed-route streetcar service</li><li>`OC ACCESS`: paratransit demand-response service</li><li>`OC Flex`: a demand-response service open to public</li></ul>|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define what super mode means in this context?

| **NTD Agencies** | Within the NTD data in the warehouse, there are various organizations that have provided transit data on an agency-wide basis. These organizations are all associated with each other via an "NTD ID". These NTD Agencies can be linked to California Transit organizations by joining the `NTD ID` with `organizations.ntd_id_2022`. |
| **GTFS Datasets** | For both GTFS Static and GTFS Real-Time data, when trying to analyze GTFS datasets it is easiest to think of `agency` as **"unique feed publisher"**, with the exception of the combined regional feed in the Bay Area, as it is a regional reporter that publishes duplicates of other feeds that we also consume.<br/><br/>**To identify "unique feed publishers":**<ul><li>Decide whether customer-facing feeds or agency feeds make sense for the analysis. For data quality analyses, customer-facing is crucial; for transit planning analyses, agency subfeeds is more relevant. </li><li>Deduplicate feeds</li></ul> |
| **GTFS-Provider-Service Relationships** | In the warehouse, this is the relationship between `organizations` and the `services` they manage. An agency can be interpreted as both depending on the use case. <br/><br/>This is not an exhaustive list of all services managed by providers, only those that we are targeting to get into GTFS reporting.<br/><br/>Each record defines an organization and one of it's services. For the most part, each service is managed by a single organization with a small number of exceptions (e.g. *Solano Express*, which is jointly managed by Solano and Napa). In all cases, it is best to define how you are using `agency` within your analyses.<br/><br/>**Reference table**: Use this table to identify provider-service relationships<br/> `cal-itp-data-infra.mart_transit_database.dim_provider_gtfs_data`<ul><li>Column: `organization_name`</li><li>Column: `mobility_service`</li><br/> |
| **Non-GTFS Datasets** | Depending on the data you are using, defining an agency can change. In most cases, an `agency` refers to a public entity. For analyses that include non-public entities, `organization` can be used as a catch-all term to include local government agencies and other entities that may not fall under this definition of `agency`.<br/><br/>**Examples of `agency` definitions:**<br/><ul><li>[DLA Local Public Agency](https://dot.ca.gov/-/media/dot-media/programs/local-assistance/documents/guide/dla-glossary052022.pdf): "A California City, county, tribal government or other local public agency. In many instances this term is used loosely to include nonprofit organizations." |
| **AgencyID Dataset** | The Transit Data Quality Team produced a dataset crosswalk in Airtable between 5 tables representing "organizations" in different internal Caltrans systems. Some of these systems had multiple records for a single entity. This dataset is not updated on a regular basis. |
| **Other Datasets** | Depending on the data you are using, defining an agency can change. In most cases, an `agency` refers to a public entity. For analyses that include non-public entities, `organization` can be used as a catch-all term to include local government agencies and other entities that may not fall under this definition of `agency`.<br/><br/>**Examples of `agency` definitions:**<br/><ul><li>[DLA Local Public Agency](https://dot.ca.gov/-/media/dot-media/programs/local-assistance/documents/guide/dla-glossary052022.pdf): "A California City, county, tribal government or other local public agency. In many instances this term is used loosely to include nonprofit organizations." |

**Note**: Defining your unit of analysis within your analyses — whether it be `agency` or `organization` or another term — can help clarify how you are using the term.