Skip to content

Forecast Checks

Matthew Cornell edited this page Apr 12, 2021 · 18 revisions
  • header must only include location, target, type, quantile, value (required for zoltpy) and forecast_date, target_end_date

  • each row must have the same number of columns as header

  • location must be in "locations" column of locations.csv

  • target must be in

    paste(1:20,  "wk ahead inc death")
    paste(1:20,  "wk ahead cum death")
    paste(0:130, "day ahead inc hosp")
    paste(1:8, "wk ahead inc case")

    county locations should have only "case" targets

  • forecast_date and target_end_date must be in YYYY-MM-DD format. Additionally, forecast_date should be within ±1 day of the date mentioned in the forecast filename. E.g. - A file in data-processed/model/2021-04-12-model.csv should have forecast_date within 2021-04-11 - 2021-04-13.

  • quantile must be in

    c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99)
  • quantile for "case" targets must be in

    c(0.025, 0.100, 0.250, 0.500, 0.750, 0.900, 0.975)
  • checks quantile must be an int or float in [0, 1]

  • checks value must be an int or float and non-negative, except for retractions as detailed below

    • Forecast retractions: If you want to retract some existing forecast rows in a file, you can do so by specifying NULL (no quote marks), not NA, None, or anything else. More details are mentioned here.
  • validates date alignment as documented in the issue add additional validations

  • validates quantiles and values (i.e., at the prediction level):

    • checks that entries in value must be non-decreasing as quantiles increase
    • checks that elements in the quantile are unique
  • validates quantiles as a group:

    • there must be exactly one point prediction for each location/target pair
  • Validates if the prediction value for a location is at least less than that location's population.

    • this check is run for all forecast submissions for all targets (in/cum deaths/cases).
    • the population truth data is present in the locations.csv file.
    • To check which predictions are violating, check the logs in the Github Actions build of your PR and the invalid predictions should be printed there.