Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional data layers and feature engineering #94

Open
1 of 5 tasks
Tracked by #91
emmamendelsohn opened this issue Jun 28, 2024 · 4 comments
Open
1 of 5 tasks
Tracked by #91

Additional data layers and feature engineering #94

emmamendelsohn opened this issue Jun 28, 2024 · 4 comments

Comments

@emmamendelsohn
Copy link
Collaborator

emmamendelsohn commented Jun 28, 2024

  • Pull in immunity layer. This reflects outbreaks that happened prior to the current season.
  • Separately, create a layer for whether there has been an outbreak in the region within the last year, as a first pass for capturing spread. (In a future iteration this could be something like distance to nearest outbreak in the last year.)

When averaging to the ADM level:

  • Sum taxa population
  • Most likely remove slope and aspect - they become less relevant/interpretable at the ADM level
  • Average immunity? (Note, the immunity layer could potentially be regenerated at the ADM level and have the parameters tuned in the model)
@n8layman
Copy link
Collaborator

n8layman commented Jul 22, 2024

@emmamendelsohn is this similar to the outbreak history layer @noam started in that immunity is inferred from previous outbreak case data? Or is the immunity layer a separate seroprevalence dataset as suggested in #79? If based on case counts this will be included in PR #76 which contains daily outbreak histories in both the short and long term. In these histories, once an outbreak occurs the impact has an exponential decline, both spatially and as time progresses.

As an aside, SpatRasters produced with terra::writeRaster(..., gdal=c("COMPRESS=LZW")) were nearly half the size as when saving long-form data as parquet using compression = "gzip", compression_level = 5. Both .parquet and .tif versions are in the data/outbreak_history_dataset/ folder.

recent outbreaks

old outbreaks

@emmamendelsohn
Copy link
Collaborator Author

Yes immunity layer is Noam's outbreak layer.

We primarily chose parquets for the ability to interact with the data outside of memory using arrow. Smaller rasters are good but before making any switch, make sure you can do all the data processing on them.

@n8layman
Copy link
Collaborator

I don't plan on switching, particularly for the full dataset tibble but it was interesting to observe and something to think about for future projects. Would tiffs avoid the parquet issue with AWS and targets?

@n8layman
Copy link
Collaborator

@kevinolival @rostal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants