Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overview of steps to complete the ARC model implementation #78

Closed
6 of 21 tasks
emmamendelsohn opened this issue Jan 19, 2024 · 3 comments
Closed
6 of 21 tasks

Overview of steps to complete the ARC model implementation #78

emmamendelsohn opened this issue Jan 19, 2024 · 3 comments

Comments

@emmamendelsohn
Copy link
Collaborator

emmamendelsohn commented Jan 19, 2024

Training

  • Finish augmented dataset at 0.1 degree resolution:
    • Pull in all static variables.
    • Pull in immunity layer.
    • Create layer for whether there has been an outbreak in the region in the last year, as a first pass. (In a future iteration this could be something like distance to nearest outbreak in the last year.)
    • Outcome layer representing whether there has been an outbreak in the pixel in the month following the selected date
      • For example, if the random date is January 19th 2023, the lag data represents three months up until this day. We need to know if an outbreak happened from January 20th - February 19th 2023.
      • Note to save time, I implemented this on the polygon basis instead of by pixel (20240305)
  • Data steps for ARC model
    • Filter to South Africa and Eswatini
    • Aggregate to ADM level 2
      • Average NDVI, weather, forecasts
      • Sum taxa population
      • Most likely remove slope and aspect - they become less relevant/interpretable at the ADM level
      • Average immunity? (Note, the immunity layer could potentially be regenerated at the ADM level and have the parameters tuned in the model)
      • Make area of the polygon one of the fields
    • We can remove forecast fields beyond 1 month ahead.
  • Model pipeline
    • Train/validation split randomly by day and ADM
    • Fix xgboost model
      • Use "base_margin" in xgboost to account for probability per unit area.
    • Validation report

Prediction

Some of these steps can use/adapt existing functions from the training pipeline.

  • Download all data for past three months
  • Transform data, including steps to scale to 0.1 degrees, calculate lagged anomalies against stored historical values.
  • Augment and aggregate into ADM regions
  • Run predictions using stored model object
  • Return a shapefile
@emmamendelsohn
Copy link
Collaborator Author

Prediction comments also in this thread: #16

@emmamendelsohn
Copy link
Collaborator Author

In case it affects your efforts on the static layers @n8layman, note we had said that we would most likely remove slope and aspect from the final model

@emmamendelsohn
Copy link
Collaborator Author

closing to address in #91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant