Skip to content

Latest commit

 

History

History
218 lines (173 loc) · 18.2 KB

ARCHITECTURE.md

File metadata and controls

218 lines (173 loc) · 18.2 KB

Architecture description

gtfs-validator is composed of a number of modules, as shown in the following dependency diagram:

graph BT;
  model
    core-->model;
    processor-->core;
    main-->model;
    main-->core;
    main-->processor;
    cli-->main;
    app:gui-->main;
    app:pkg-->app:gui;
Loading

Main

Depends on: processor, core, and model

If you're looking to add new GTFS fields or rules, you'll want to look at this module and read the section on adding new tables and fields.

Contains:

  • GTFS table schemas - Defines how GTFS files (e.g., trips.txt) and the fields contained within that file (e.g., trip_id) are represented in the validator. You can add new GTFS files and fields here.
  • Business logic validation rules - Code that validates GTFS field values. You can add new validation rules here.
  • Error notices - Containers for information about errors discovered during validation. You can add a new notice as a nested static class annotated with @GtfsValidationNotice in the corresponding validation rule class.
  • Report generators - Create reports and summaries of the notices generated by the rule validators.

Generated Classes

The architecture leverages AutoValue and annotations in Gtfs[File]Schema.java classes (such as GtfsTripSchema.java) in main to auto-generate the following classes used for loading and validating each File. A class is only generated if the Schema class is annotated to generate that type of class.

  • Gtfs[File].java
    • Internally represents an entity in a GTFS File
    • Example: GtfsStopTime.java
  • Gtfs[Field].java
    • Enumerates the possible values for an enumerated Field
    • Example: GtfsFrequencyExactTimes.java
  • Gtfs[Field]Enum.java
    • An interface for each Field with enumerated values, implemented by each Gtfs[Field].java
    • Example: GtfsFrequencyExactTimesEnum.java
  • Gtfs[File]TableContainer.java
    • Represents an instance of a GTFS File as a table, including its entities
    • Example: GtfsRouteTableContainer.java
  • Gtfs[File]TableDescriptor.java
    • Describes the specification of the table representation of a GTFS File
    • Example: GtfsStopTableDescriptor.java
  • Gtfs[File][Annotation]Validator.java
    • Validates all fields with a particular validating Annotation in a Gtfs File
    • Example: GtfsAgencyMixedCaseValidator.java
  • Gtfs[File][Field][Annotation]Validator.java
    • Validates a particular Field with a particular validating Annotation in a Gtfs File
    • Example: GtfsAttributionAgencyIdForeignKeyValidator.java
  • AutoValue_Gtfs[File]TableContainer_CompositeKey.java
    • Builder for composite keys for entities in a File where multiple fields comprise the primary key
    • Example: AutoValue_GtfsTransferTableContainer_CompositeKey.java

To generate these classes with AutoValue, build the project. When annotations are modified, the project must be rebuilt for the changes to propagate.

Processor

Depends on: core

Contains:

  • A file analyser to analyse annotations on Java interfaces that define GTFS schema and translate them to descriptors
  • Descriptors of annotations fields (ForeignKey, GtfsEnum, GtfsField, GtfsFile)
  • A processor to auto-generate data classes, loaders and validators based on annotations on GTFS schema interfaces
  • GTFS entity classes to generate class names for a given GTFS table
  • Code generators to generate code from annotations found by file analyser (e.g. EnumGenerator)

Core

Depends on: model

Contains:

  • Code to read zipped and unzipped file input
  • CSV file and row parsers
  • Notice to be generated when checking data type validation rules such as EmptyFileNotice
  • A notice container (NoticeContainer)
  • GTFS data type definitions such as GtfsTime, GtfsDate, or GtfsColor
  • GtfsFeedLoader to load for a whole GTFS feed with all its CSV files
  • GTFS feed's name

Model

Depends on: nothing

Contains:

  • root interfaces and annotations for modeling a GTFS schema table

Business logic should generally not be added to this module.

CLI

Depends on: main

A command-line-based application for running the validator.

App:Gui

Depends on: main

A GUI-based application for running the validator as a desktop application.

App:Pkg

Depends on: app:gui

A minimal wrapper around app:gui designed to facilitate packaging the GUI application as a Java Module and producing standalone executables and installers for various platforms.

Data pipeline 📥➡️♨➡️📤

1️⃣ Inputs

  • A local GTFS archive (zip file) or fully qualified URL from which to download a GTFS archive
  • Command line arguments

2️⃣ Validator loading

  • Locate all validators annotated with @GtfsValidator and load them

3️⃣ Feed loading

  • Read GTFS files
  • Create GtfsTableContainer from data
  • Invoke and execute all SingleEntityValidators to validate data types, etc.

4️⃣ Validators execution

  • Invoke and execute all FileValidators in parallel to validate GTFS semantic rules

5️⃣ Notice export

  1. Creates path to export notices as specified by command line input --output (or -o).
  2. Export notices from NoticeContainer to two JSON files in the specified directory - report.json for validator results and system_errors.json for any software errors that occurred during validation. Notices are alphabetically sorted in the .json files.

Adding new tables and fields

Let's say that you are an agency which for some reason uses other_file.txt as an additional table to represent GTFS information, and your goal is to implement validation rule related to this new table. To do so, you would have to:

  1. add the new table to the validator as an annotated Gtfs[OtherFile]Schema.java class in main;
  2. implement the new semantic validation rules and notices in main.

This section details how existing table are defined and gives information on annotation usage. One can then transpose these explanations to add a new table or field. Let's take a look at GtfsCalendarSchema:

package org.mobilitydata.gtfsvalidator.table;

import org.mobilitydata.gtfsvalidator.annotation.ConditionallyRequired;
import org.mobilitydata.gtfsvalidator.annotation.EndRange;
import org.mobilitydata.gtfsvalidator.annotation.FieldType;
import org.mobilitydata.gtfsvalidator.annotation.FieldTypeEnum;
import org.mobilitydata.gtfsvalidator.annotation.GtfsTable;
import org.mobilitydata.gtfsvalidator.annotation.PrimaryKey;
import org.mobilitydata.gtfsvalidator.annotation.Required;
import org.mobilitydata.gtfsvalidator.type.GtfsDate;

@GtfsTable("calendar.txt")
@ConditionallyRequired
public interface GtfsCalendarSchema extends GtfsEntity {
  @FieldType(FieldTypeEnum.ID)
  @PrimaryKey
  @Required
  String serviceId();

  @Required
  GtfsCalendarService monday();

  @Required
  GtfsCalendarService tuesday();

  @Required
  GtfsCalendarService wednesday();

  @Required
  GtfsCalendarService thursday();

  @Required
  GtfsCalendarService friday();

  @Required
  GtfsCalendarService saturday();

  @Required
  GtfsCalendarService sunday();

  @Required
  @EndRange(field = "end_date", allowEqual = true)
  GtfsDate startDate();

  @Required
  GtfsDate endDate();
}

By order of appearance in the interface definition:

  • @GtfsTable: annotates the interface that defines schema for calendar.txt - The processor will generates data classes, loaders and validators based on annotations on this GTFS schema interface.
  • @ConditionallyRequired: hints that this file is conditionally required.
  • @FieldType: specifies calendar_service_id is defined as an ID by the GTFS specification.
  • @PrimaryKey: specifies calendar_service_id is the primary key of this table.
  • @Required: specifies a value for calendar_service_id is required - A notice will be issued at the parsing stage.
  • @EndRange: specifies endDate is the end point for the date range defined by calendar.start_date and calendar.end_time - A validator will be generated and check if calendar.start_date is before or equal to calendar.end_date.

Annotations definitions

Annotation Definition
@CachedField Enables caching of values for a given field to optimize memory usage.
@ConditionallyRequired A hint that a field or a file is conditionally required and custom validators should validate the requirement.
@CurrencyAmount Specifies a field represents a currency and should be interpreted according to currencyField.
@DefaultValue Specifies a default value for a particular GTFS field.
@EndRange Specifies a field for the end point of a date or time range.
@FieldType Specifies type of a GTFS field, e.g., COLOR or LATITUDE.
@ForeignKey Specifies a reference to a foreign key.
@Generated Marker for all classes generated by annotation processor.
@GtfsEnumValue Specifies a value for a GTFS enum.
@GtfsEnumValues It is necessary for making GtfsEnumValue annotation repeatable.
@GtfsTable Annotates an interface that defines schema for a single GTFS table, such as stops.txt.
@GtfsValidationNotice Identifies, configures, and documents a validation notice.
@GtfsValidator Annotates both custom and automatically generated validators to make them discoverable on the fly.
@Index Asks annotation processor to create an index for quick search on a given field. The field does not need to have unique values.
@MixedCase Specifies a string field should have a mixed-case value.
@NonNegative Generates a validation that an integer or a double (float) field is not negative.
@NonZero Generates a validation that an integer or a double (float) field is not zero.
@Positive Generates a validation that an integer or a double (float) field is positive.
@PrimaryKey Specifies a field is the or part of the primary key in a GTFS table. Adds validation that each entry's set of primary key fields is unique in the table.
@Recommended For a file, specifies the file is recommended. For a field, specifies the column and all non-empty values are recommended.
@RecommendedColumn For a field, specifies the column is recommended, but empty values in that field do not violate the recommendation.
@Required For a file, specifies the file is recommended. For a field, specifies the column and all non-empty values are required.
@RequiredColumn For a field, specifies the column is required, but empty values in that field do not violate the requirement.