Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/data viz #80

Merged
merged 51 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
4f91ff9
first pass at building augmented dataset. getting error that some (no…
emmamendelsohn Nov 21, 2023
bccfd6f
closes #68 but still throws error on join "Last error: IOError: Could…
emmamendelsohn Nov 27, 2023
6bdd257
regenerated forecast anomaly files with arrow 14.0.0.2 and seems to b…
emmamendelsohn Dec 5, 2023
c85a0fd
tar_prune
emmamendelsohn Dec 5, 2023
0304ded
snapshot renv
emmamendelsohn Dec 5, 2023
f7dd0cd
rerunning forecast anomalies to replace NAs in files
emmamendelsohn Dec 8, 2023
f252bc5
finish forecast anomaly regeneration
emmamendelsohn Dec 11, 2023
3bf2083
complete anomaly dataset
emmamendelsohn Dec 15, 2023
5ad5f45
copy .rprofile from repel
emmamendelsohn Dec 15, 2023
3e2118b
add .Renvignore
emmamendelsohn Dec 15, 2023
894097b
first working shiny version
emmamendelsohn Dec 18, 2023
cb88281
working side by side maps with ndvi and ndvi anomalies
emmamendelsohn Dec 18, 2023
cbb1ad5
working slider bar
emmamendelsohn Dec 18, 2023
c334a37
some aesthetic improvements
emmamendelsohn Dec 18, 2023
5286d92
remove raw data - using anomalies only
emmamendelsohn Dec 19, 2023
b06c4ef
thee anomaly lags
emmamendelsohn Dec 19, 2023
d402686
add labels
emmamendelsohn Dec 19, 2023
8969f67
add dataset toggle
emmamendelsohn Dec 19, 2023
fc3b308
working temperature map
emmamendelsohn Dec 19, 2023
9a40eb9
working function to create maps
emmamendelsohn Dec 19, 2023
60773db
this works
emmamendelsohn Dec 19, 2023
636f674
works for temp too
emmamendelsohn Dec 19, 2023
83ca3e6
fix labels and color scale
emmamendelsohn Dec 19, 2023
c337178
add dynamic text
emmamendelsohn Dec 19, 2023
79c9f66
improve dynamic rendering
emmamendelsohn Dec 19, 2023
9e45018
more code improvements
emmamendelsohn Dec 19, 2023
4f91aa4
add other weather datasets
emmamendelsohn Dec 20, 2023
0821939
fix color scales
emmamendelsohn Dec 20, 2023
0342f9d
use augmented data instead of invidual anomaly sets
emmamendelsohn Dec 20, 2023
49144fa
tweaks
emmamendelsohn Dec 20, 2023
f659095
add forecasts but the layout is incorrect
emmamendelsohn Dec 20, 2023
d0f786a
small tweak
emmamendelsohn Dec 20, 2023
e6c6275
fix typo
emmamendelsohn Dec 20, 2023
6d5e342
dynamic UI works
emmamendelsohn Dec 29, 2023
f3b941d
script to download augmented data from aws
emmamendelsohn Dec 29, 2023
187376f
this works to functionalize leaflet output but throws quick error bef…
emmamendelsohn Dec 29, 2023
005ec84
rename function
emmamendelsohn Dec 29, 2023
60b401c
separate tabs for forecasted and recorded data
emmamendelsohn Dec 29, 2023
e4467b0
add todo notes
emmamendelsohn Dec 29, 2023
7d6dee1
switch from tabs to radio buttons
emmamendelsohn Dec 30, 2023
87e9b1b
create target for comparison of validation/forecast data against reco…
emmamendelsohn Jan 3, 2024
8cf4903
run forecast anomaly validation
emmamendelsohn Jan 5, 2024
c6e6991
add select choice boxes - not yet connected
emmamendelsohn Jan 5, 2024
459a238
yay dynamic UI for maps no more error loading messages
emmamendelsohn Jan 5, 2024
7bc11d5
add dynamic tags
emmamendelsohn Jan 5, 2024
95687d2
cleanup, fix width
emmamendelsohn Jan 5, 2024
67af52d
get dynamic tags working
emmamendelsohn Jan 12, 2024
c0a7619
useful script for testing the code within the app
emmamendelsohn Jan 12, 2024
c954ac2
first pass at displaying comparisons
emmamendelsohn Jan 12, 2024
427e62d
working app with TODO list
emmamendelsohn Jan 12, 2024
bd1c6a3
quick code cleanup
emmamendelsohn Jan 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .Renvignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
*
!packages.R
!packages.r
!*.R
!*.r
!R/*.R
!R/*.r
!_targets*.R
!_targets*.r
!vignettes/*.r
!vignettes/*.R
!vignettes/*.Rmd
!vignettes/*.rmd
!reports/*.r
!reports/*.R
!reports/*.Rmd
!reports/*.rmd
95 changes: 35 additions & 60 deletions .Rprofile
Original file line number Diff line number Diff line change
@@ -1,81 +1,56 @@
local({
# Load env vars from any file starting with `.env`. This allows user-specific
# options to be set in `.env_user` (which is .gitignored), and to have both
# encrypted and non-encrypted .env files
load_env <- function(){
for (env_file in list.files(all.files = TRUE, pattern = "^\\.env.*")) {
try(readRenviron(env_file), silent = TRUE)
}
user_rprof <- Sys.getenv("R_PROFILE_USER", normalizePath("~/.Rprofile", mustWork = FALSE))
if(interactive() && file.exists(user_rprof)) {
source(user_rprof)
}
load_env()

# If there is a bucket, cache targets remotely. Otherwise, do so locally.
if(!nzchar(Sys.getenv("TAR_PROJECT"))) {
if(nzchar(Sys.getenv("AWS_BUCKET_ID"))) {
Sys.setenv(TAR_PROJECT = "s3")
} else {
Sys.setenv(TAR_PROJECT = "main")
}
})
}
if(interactive()){
message(paste("targets project is", Sys.getenv("TAR_PROJECT")))
}

# Set options for renv convenience
options(
repos = c(CRAN = "https://cloud.r-project.org",
MILESMCBAIN = "https://milesmcbin.r-universe.dev",
ROPENSCI = "https://ropensci.r-universe.dev"),
renv.config.auto.snapshot = FALSE, ## Attempt to keep renv.lock updated automatically
renv.config.rspm.enabled = TRUE, ## Use RStudio Package manager for pre-built package binaries for linux
renv.config.install.shortcuts = FALSE, ## Use the existing local library to fetch copies of packages for renv
renv.config.cache.enabled = TRUE ## Use the renv build cache to speed up install times
)

# Put the project library *outside* the project
#Sys.setenv(RENV_PATHS_LIBRARY_ROOT = file.path(normalizePath("~/.renv-project-libraries", mustWork = FALSE)))
# Set options for internet timeout
options(timeout = max(300, getOption("timeout")))

# Use capsule if specified by the user
if(Sys.getenv("USE_CAPSULE") %in% c("1", "TRUE", "true")) {
if (interactive() && file.exists("renv.lock")) {
message("renv library not loaded (found env var USE_CAPSULE=", Sys.getenv("USE_CAPSULE"), "). Use `capsule` functions (see https://github.com/MilesMcBain/capsule)")
if(require(capsule, quietly = TRUE)) {
capsule::whinge()
} else {
message('Install {capsule} with install.packages("capsule", repos = c(mm = "https://milesmcbain.r-universe.dev", getOption("repos")))')
}
capsule::whinge()
}
} else {
source("renv/activate.R")
load_env() # reload project .env files, after renv/activate.R runs renv::load() which reads user's .renviron
}

# Use the local user's .Rprofile when interactive.
# Good for keeping local preferences, but not always reproducible.

if (nzchar( Sys.getenv("TAR_PROJECT"))) {
message(paste0("targets project is '", Sys.getenv("TAR_PROJECT"), "'"))
} else {
message("targets project is default")
}

options(
repos = c(RSPM = "https://packagemanager.rstudio.com/all/latest",
CRAN = "https://cran.rstudio.com/",
INLA = "https://inla.r-inla-download.org/R/testing"),

renv.config.auto.snapshot = TRUE, ## Attempt to keep renv.lock updated automatically
renv.config.rspm.enabled = TRUE, ## Use RStudio Package manager for pre-built package binaries
renv.config.install.shortcuts = TRUE, ## Use the existing local library to fetch copies of packages for renv
renv.config.cache.enabled = TRUE, ## Use the renv build cache to speed up install times
renv.config.cache.symlinks = TRUE, ## Keep full copies of packages locally than symlinks to make the project portable in/out of containers
renv.config.install.transactional = FALSE

)

# Set maximum allowed total size (in bytes) of global variables for future package. Used to prevent too large exports.
if (Sys.info()[["nodename"]] %in% c("aegypti-reservoir" , "prospero-reservoir")) {
options(
future.globals.maxSize = 4194304000
)
}

# Since RSPM does not provide Mac binaries, always install packages from CRAN
# on mac or windows, even if renv.lock specifies they came from RSPM
if (Sys.info()[["sysname"]] %in% c("Darwin", "Windows")) {
options(renv.config.repos.override = c(
CRAN = "https://cran.rstudio.com/",
INLA = "https://inla.r-inla-download.org/R/testing"))
} else if (Sys.info()[["sysname"]] == "Linux") {
options(renv.config.repos.override = c(
RSPM = "https://packagemanager.rstudio.com/all/latest",
INLA = "https://inla.r-inla-download.org/R/testing"))
}

# If project packages have conflicts define them here
# If project packages have conflicts define them here so as
# as to manage them across all sessions when building targets
if(requireNamespace("conflicted", quietly = TRUE)) {
conflicted::conflict_prefer("filter", "dplyr", quiet = TRUE)
conflicted::conflict_prefer("count", "dplyr", quiet = TRUE)
conflicted::conflict_prefer("select", "dplyr", quiet = TRUE)
conflicted::conflict_prefer("geom_rug", "ggplot2", quiet = TRUE)
conflicted::conflict_prefer("set_names", "magrittr", quiet = TRUE)
conflicted::conflict_prefer("View", "utils", quiet = TRUE)
}

# Suppress summarize messages
options(dplyr.summarise.inform = FALSE)

46 changes: 46 additions & 0 deletions R/augment_data.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#' @title
#' @param weather_anomalies
#' @param forecasts_anomalies
#' @param ndvi_anomalies
#' @param augmented_data_directory
#' @return
#' @author Emma Mendelsohn
#' @export
augment_data <- function(weather_anomalies, forecasts_anomalies,
ndvi_anomalies, augmented_data_directory) {


message("Load datasets into memory")
weather <- arrow::open_dataset(weather_anomalies) |> dplyr::collect()
forecasts <- arrow::open_dataset(forecasts_anomalies) |> dplyr::collect()
ndvi <- arrow::open_dataset(ndvi_anomalies) |> dplyr::collect()

message("NA checks")
## Weather and forecasts
### NAs are in scaled precip data, due to days with 0 precip
weather_check <- purrr::map_lgl(weather, ~any(is.na(.)))
assertthat::assert_that(all(str_detect(names(weather_check[weather_check]), "scaled")))

forecasts_check <- purrr::map_lgl(forecasts, ~any(is.na(.)))
assertthat::assert_that(all(str_detect(names(forecasts_check[forecasts_check]), "scaled")))

## NDVI
### Prior to 2018: NAs are due to region missing from Eastern Africa in modis data
### After 2018: NAs are due to smaller pockets of missing data on a per-cycle basis
### okay to remove when developing RSA model (issue #72)
ndvi_check <- purrr::map_lgl(ndvi, ~any(is.na(.)))
assertthat::assert_that(!any(ndvi_check[c("date", "x", "y")]))
ndvi <- drop_na(ndvi)

message("Join into a single object")
augmented_data <- left_join(weather, forecasts, by = join_by(date, x, y)) |>
left_join(ndvi, by = join_by(date, x, y))

message("Save as parquets using hive partitioning by date")
augmented_data |>
group_by(date) |>
write_dataset(augmented_data_directory)

return(augmented_data_directory)

}
5 changes: 4 additions & 1 deletion R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,7 @@ all_targets <- function(env = parent.env(environment()), type = "tar_target") {
#
# tar_load_s3 <- function(target_name, ...) {
#
# }
# }

#' Get NAs
col_na <- function(df) purrr::map_lgl(df, ~any(is.na(.)))
115 changes: 115 additions & 0 deletions R/validate_forecasts_anomalies.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
#' .. content for \description{} (no empty lines) ..
#'
#' .. content for \details{} ..
#'
#' @title
#' @param forecasts_validate_directory
#' @param forecasts_anomalies
#' @param nasa_weather_transformed
#' @param weather_historical_means
#' @param model_dates_selected
#' @param lead_intervals
#' @param overwrite
#' @return
#' @author Emma Mendelsohn
#' @export
validate_forecasts_anomalies <- function(forecasts_validate_directory,
forecasts_anomalies,
nasa_weather_transformed,
weather_historical_means,
model_dates_selected, lead_intervals,
overwrite = FALSE) {

# Set filename
date_selected <- model_dates_selected
save_filename <- glue::glue("forecast_validate_{date_selected}.gz.parquet")
message(paste0("Validating forecast for ", date_selected))

# Check if file already exists
existing_files <- list.files(forecasts_validate_directory)
if(save_filename %in% existing_files & !overwrite) {
message("file already exists, skipping download")
return(file.path(forecasts_validate_directory, save_filename))
}

# Open dataset to forecast anomalies and weather data
forecasts_anomalies <- open_dataset(forecasts_anomalies) |> filter(date == date_selected)
nasa_weather_transformed <- open_dataset(nasa_weather_transformed)

# Calculate weather anomalies for selected forecast dates, mapping over the lead intervals
lead_intervals_start <- c(0 , lead_intervals[-length(lead_intervals)]) # 0 to include current day in forecast
lead_intervals_end <- lead_intervals - 1 # -1 for 30 days total include start day

forecasts_validation <- map(1:length(lead_intervals_start), function(i){

# subset to start and end day of interval
start <- lead_intervals_start[i]
end <- lead_intervals_end[i]

lead_start_date <- date_selected + start
lead_end_date <- date_selected + end

# get historical means for lead period, removing doy 366
lead_dates <- seq(lead_start_date, lead_end_date, by = "day")
lead_doys <- yday(lead_dates)

if(366 %in% lead_doys) {
if(tail(lead_doys, 1) == 366){
lead_doys <- lead_doys[lead_doys!=366]
lead_doys <- c(lead_doys, 1)
}else{
lead_doys <- lead_doys[lead_doys!=366]
lead_doys <- c(lead_doys, tail(lead_doys, 1) + 1)
}
}

doy_start <- head(lead_doys, 1)
doy_end <- tail(lead_doys, 1)
doy_start_frmt <- str_pad(doy_start, width = 3, side = "left", pad = "0")
doy_end_frmt <- str_pad(doy_end, width = 3, side = "left", pad = "0")
doy_range <- glue::glue("{doy_start_frmt}_to_{doy_end_frmt}")

historical_means <- read_parquet(weather_historical_means[str_detect(weather_historical_means, doy_range)])

# get average for weather data over this period
weather_means <- nasa_weather_transformed |>
filter(date %in% lead_dates) |>
group_by(x, y) |>
summarize(lead_relative_humidity_mean = mean(relative_humidity),
lead_temperature_mean = mean(temperature),
lead_precipitation_mean = mean(precipitation)) |>
ungroup()

# Join in historical means to calculate anomalies raw and scaled
weather_anomalies <- full_join(weather_means, historical_means, by = c("x", "y")) |>
mutate(!!paste0("anomaly_relative_humidity_recorded_", end) := lead_relative_humidity_mean - historical_relative_humidity_mean,
!!paste0("anomaly_temperature_recorded_", end) := lead_temperature_mean - historical_temperature_mean,
!!paste0("anomaly_precipitation_recorded_", end) := lead_precipitation_mean - historical_precipitation_mean,
!!paste0("anomaly_relative_humidity_scaled_recorded_", end) := (lead_relative_humidity_mean - historical_relative_humidity_mean)/historical_relative_humidity_sd,
!!paste0("anomaly_temperature_scaled_recorded_", end) := (lead_temperature_mean - historical_temperature_mean)/historical_temperature_sd,
!!paste0("anomaly_precipitation_scaled_recorded_", end) := (lead_precipitation_mean - historical_precipitation_mean)/historical_precipitation_sd) |>
select(-starts_with("lead"), -starts_with("historical"))


# Now calculate difference forecast v recorded
forecasts_anomalies |>
select(x, y, ends_with(as.character(end))) |>
left_join(weather_anomalies, by = c("x", "y")) |>
mutate(!!paste0("anomaly_relative_humidity_difference_", end) := !!sym(paste0("anomaly_relative_humidity_forecast_", end)) - !!sym(paste0("anomaly_relative_humidity_recorded_", end)),
!!paste0("anomaly_temperature_difference_", end) := !!sym(paste0("anomaly_temperature_forecast_", end)) - !!sym(paste0("anomaly_temperature_recorded_", end)),
!!paste0("anomaly_precipitation_difference_", end) := !!sym(paste0("anomaly_precipitation_forecast_", end)) - !!sym(paste0("anomaly_precipitation_recorded_", end)),
!!paste0("anomaly_relative_humidity_scaled_difference_", end) := !!sym(paste0("anomaly_relative_humidity_scaled_forecast_", end)) - !!sym(paste0("anomaly_relative_humidity_scaled_recorded_", end)),
!!paste0("anomaly_temperature_scaled_difference_", end) := !!sym(paste0("anomaly_temperature_scaled_forecast_", end)) - !!sym(paste0("anomaly_temperature_scaled_recorded_", end)),
!!paste0("anomaly_precipitation_scaled_difference_", end) := !!sym(paste0("anomaly_precipitation_scaled_forecast_", end)) - !!sym(paste0("anomaly_precipitation_scaled_recorded_", end)))
}) |>
reduce(left_join, by = c("x", "y")) |>
mutate(date = date_selected) |>
relocate(date)

# Save as parquet
write_parquet(forecasts_validation, here::here(forecasts_validate_directory, save_filename), compression = "gzip", compression_level = 5)

return(file.path(forecasts_validate_directory, save_filename))


}
Loading