Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 41 additions & 1 deletion R/htr_calc_anomalies.R
Original file line number Diff line number Diff line change
@@ -1,11 +1,51 @@
#' Calculate anomalies relative to the baseline mean
#'
#' This function calculates climate anomalies by subtracting baseline mean values
#' from projection data using CDO (Climate Data Operators). It processes multiple
#' climate model files in parallel, matching variables, frequencies, and models
#' between the projection data and baseline means.
#'
#' @details
#' The function uses the CDO `sub` operator to subtract baseline means from
#' projection files. It automatically matches files based on variable, frequency,
#' and model metadata extracted from CMIP6-formatted filenames. The process runs
#' in parallel using multiple CPU cores for efficient processing of large datasets.
#'
#' The workflow involves:
#' 1. Extracting metadata from baseline mean files
#' 2. Finding corresponding projection files for each variable-frequency-model combination
#' 3. Subtracting the appropriate baseline mean from each projection file using CDO
#' 4. Saving results with "_anomalies_" in the filename
#'
#' @inheritParams htr_slice_period
#' @param mndir The directory where the baseline mean files are stored
#' @param mndir Character string. The directory where the baseline mean files are
#' stored. Files should follow CMIP6 naming conventions with variable, frequency,
#' and model information in the filename.
#'
#' @return
#' No return value. The function creates anomaly files in the specified output
#' directory with "_anomalies_" replacing "_merged_" in the original filenames.
#'
#' @note
#' - Requires CDO (Climate Data Operators) to be installed and accessible from the system PATH
#' - Input files must follow CMIP6 naming conventions for proper metadata extraction
#' - Baseline mean files and projection files must have matching variable, frequency, and model names
#' - Uses parallel processing with (number of CPU cores - 2) workers
#'
#' @references
#' CDO User Guide: https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf
#' CDO sub operator: https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf#page=297
#'
#' @export
#'
#' @examples
#' \dontrun{
#' htr_calc_anomalies(
#' indir = file.path(base_dir, "data", "tos", "raw"),
#' indir = file.path(base_dir, "data", "tos", "mean"),
#' outdir = file.path(base_dir, "data", "tos", "anomalies")
#' )
#' }
htr_calc_anomalies <- function(indir, # input directory of the projections
mndir, # directory of baseline mean
outdir # where anomalies will be saved
Expand Down
51 changes: 48 additions & 3 deletions R/htr_calc_mean.R
Original file line number Diff line number Diff line change
@@ -1,14 +1,59 @@
#' Calculate mean of specified time period.
#' Calculate temporal mean of specified time period
#'
#' Used to calculate baseline means.
#' This function calculates temporal means over a specified time period using CDO
#' (Climate Data Operators). It is primarily used to calculate baseline climatological
#' means from historical climate data, which can then be used for anomaly calculations.
#'
#' @details
#' The function uses the CDO `timmean` operator combined with `selyear` to calculate
#' temporal means over the specified year range. It processes files in parallel for
#' efficient computation of large climate datasets. The function automatically
#' generates output filenames with "_mean_" and the year range in the filename.
#'
#' The CDO command executed is:
#' `cdo -L -timmean -selyear,year_start/year_end input_file output_file`
#'
#' Where:
#' - `-L` enables netCDF4 compression
#' - `timmean` calculates the temporal mean
#' - `selyear` selects the specified year range
#'
#' @author Dave Schoeman and Tin Buenafe
#'
#' @inheritParams htr_slice_period
#' @param scenario Character string. The CMIP6 scenario to process (e.g., "historical",
#' "ssp126", "ssp245"). Use "historical" for calculating baseline climatological means.
#' @param year_start Numeric. Starting year for calculating the temporal mean (inclusive).
#' @param year_end Numeric. Ending year for calculating the temporal mean (inclusive).
#'
#' @return
#' No return value. The function creates mean files in the specified output directory
#' with "_mean_YYYYMMDD-YYYYMMDD.nc" replacing "_merged_" in the original filenames,
#' where the dates represent the start and end of the averaging period.
#'
#' @note
#' - Requires CDO (Climate Data Operators) to be installed and accessible from the system PATH
#' - Input files must be merged time series files (typically created by [`htr_merge_files()`])
#' - Uses parallel processing with (number of CPU cores - 2) workers
#' - The `-L` flag enables netCDF4 compression for smaller output files
#'
#' @references
#' CDO User Guide: https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf
#' CDO timmean operator: https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf#page=180
#' CDO selyear operator: https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf#page=124
#'
#' @export
#'
#' @examples
#' \dontrun{
#' htr_calc_mean(
#' indir = file.path(base_dir, "data", "tos", "raw"),
#' outdir = file.path(base_dir, "data", "tos", "mean"),
#' scenario = "historical",
#' year_start = 1950,
#' year_end = 2014
#' )
#' }
htr_calc_mean <- function(indir, # where inputs are
outdir, # where outputs will be saved
scenario, # historical or ssp (use historical for calculating baseline means)
Expand All @@ -17,7 +62,7 @@ htr_calc_mean <- function(indir, # where inputs are
) {
. <- NULL # Stop devtools::check() complaints about NSE

w <- parallel::detectCores() - 2
w <- parallelly::availableCores(method = "system", omit = 2)

##############

Expand Down
50 changes: 44 additions & 6 deletions R/htr_change_freq.R
Original file line number Diff line number Diff line change
@@ -1,19 +1,57 @@
#' Change frequency
#' Change temporal frequency of climate data
#'
#' This function changes the temporal frequency of climate data from daily to either
#' monthly or yearly averages using CDO (Climate Data Operators). It supports both
#' HPC array job processing and parallel processing for efficient computation.
#'
#' @details
#' The function uses CDO temporal aggregation operators to change frequency:
#' - For yearly frequency: Uses `cdo -yearmean` to calculate annual means
#' - For monthly frequency: Uses `cdo -monmean` to calculate monthly means
#'
#' The function can operate in different modes:
#' - **Array mode** (`hpc = "array"`): Processes a single specified file (useful for HPC job arrays)
#' - **Parallel mode** (`hpc = "parallel"` or `hpc = NA`): Processes all files in the input directory using parallel workers
#'
#' Output files are renamed to reflect the new temporal frequency, replacing "_merged_"
#' with either "_annual_" or "_monthly_" in the filename.
#'
#' @author Tin Buenafe
#'
#' @inheritParams htr_slice_period
#' @param freq Character string. The target temporal frequency. Valid options are:
#' - `"yearly"` or `"annual"`: Calculate annual means using CDO yearmean
#' - `"monthly"`: Calculate monthly means using CDO monmean
#'
#' @return
#' No return value. The function creates frequency-converted files in the specified
#' output directory with "_annual_" or "_monthly_" replacing "_merged_" in the
#' original filenames.
#'
#' @note
#' - Requires CDO (Climate Data Operators) to be installed and accessible from the system PATH
#' - Input files should typically be daily frequency data for meaningful aggregation
#' - For HPC environments, set `hpc = "array"` and specify the `file` parameter
#' - Uses parallel processing when `hpc = NA` or `hpc = "parallel"`
#' - Worker count is automatically determined based on available CPU cores
#'
#' @references
#' CDO User Guide: https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf
#' CDO yearmean operator: https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf#page=191
#' CDO monmean operator: https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf#page=186
#'
#' @export
#'
#' @examples
#' \dontrun{
#' htr_change_freq(
#' hpc = NA,
#' file = NA,
#' freq = "monthly",
#' indir = here("data", "proc", "sliced", variable),
#' outdir = here("data", "proc", "monthly", variable)
#' hpc = NA,
#' file = NA,
#' freq = "monthly",
#' indir = file.path(".", "data", "proc", "sliced", variable),
#' outdir = file.path(".", "data", "proc", "monthly", variable)
#' )
#' }
htr_change_freq <- function(hpc = NA, # if ran in the HPC, possible values are "array", "parallel"
file = NA, # hpc = "array", the input will be the file
freq, # possible values are "yearly" or "monthly"
Expand Down
69 changes: 55 additions & 14 deletions R/htr_create_ensemble.R
Original file line number Diff line number Diff line change
@@ -1,26 +1,67 @@
#' Create an ensemble based on list of models
#' Create multi-model ensemble from climate model outputs
#'
#' This function creates multi-model ensembles by combining outputs from multiple
#' climate models using CDO (Climate Data Operators). It can calculate either the
#' ensemble mean or median across the specified models, with support for seasonal
#' and depth-resolved data filtering.
#'
#' @details
#' The function uses CDO ensemble operators to combine multiple model outputs:
#' - **Ensemble mean**: Uses `cdo -ensmean` to calculate the arithmetic mean across models
#' - **Ensemble median**: Uses `cdo -ensmedian` to calculate the median across models
#'
#' The function automatically:
#' 1. Filters files based on variable, frequency, scenario, and optionally season/domain
#' 2. Selects only files from the specified models in `model_list`
#' 3. Creates ensemble statistics using the appropriate CDO operator
#' 4. Saves output with "ensemble" replacing the model name in the filename
#'
#' Output files are compressed using zip compression (`-z zip`) and use netCDF4
#' format with the `-L` flag for efficient storage.
#'
#' @inheritParams htr_slice_period
#' @param model_list Character string of models to use for the ensemble
#' @param variable The variable to create the ensemble for
#' @param mean Use the mean (TRUE; default) or the median (FALSE) when creating the ensemble.
#' @param season If using seasonal frequency, input the season name to detect the files
#' @param domain If using depth-resolved models, input the domain name to detect the files
#' @param model_list Character vector. Names of climate models to include in the
#' ensemble. Model names must match those in the input filenames (e.g.,
#' `c("ACCESS-ESM1-5", "CanESM5", "GFDL-ESM4")`).
#' @param variable Character string. The climate variable to create the ensemble for
#' (e.g., "tos" for sea surface temperature, "pr" for precipitation). Default is "tos".
#' @param mean Logical. If `TRUE` (default), calculates ensemble mean using CDO ensmean.
#' If `FALSE`, calculates ensemble median using CDO ensmedian.
#' @param season Character string. Optional season name to filter files (e.g., "DJF",
#' "JJA"). Only files containing this string will be included. Default is empty string (no filtering).
#' @param domain Character string. Optional domain name for depth-resolved models
#' (e.g., "surface", "0-100m"). Only files containing this string will be included.
#' Default is empty string (no filtering).
#'
#' @return
#' No return value. The function creates an ensemble file in the specified output
#' directory with "ensemble" replacing the model name in the original filename.
#'
#' @note
#' - Requires CDO (Climate Data Operators) to be installed and accessible from the system PATH
#' - All input files must be on the same spatial grid (use [`htr_regrid_esm()`] first if needed)
#' - All input files must have the same temporal resolution and time periods
#' - Model names in `model_list` must exactly match those in the input filenames
#' - Uses zip compression for efficient file storage
#'
#' @references
#' CDO User Guide: https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf
#' CDO ensmean operator: https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf#page=78
#' CDO ensmedian operator: https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf#page=79
#'
#' @export
#'
#' @examples
#' \dontrun{
#' htr_create_ensemble(
#' hpc = NA,
#' indir = file.path(base_dir, "data", "proc", "regridded", "yearly", "tos"),
#' outdir = file.path(base_dir, "data", "proc", "ensemble", "mean", "tos"),
#' model_list = c("ACCESS-ESM1-5", "CanESM5"),
#' variable = "tos",
#' freq = "Omon",
#' scenario = "ssp126",
#' mean = TRUE
#' hpc = NA,
#' indir = file.path(base_dir, "data", "proc", "regridded", "yearly", "tos"),
#' outdir = file.path(base_dir, "data", "proc", "ensemble", "mean", "tos"),
#' model_list = c("ACCESS-ESM1-5", "CanESM5"),
#' variable = "tos",
#' freq = "Omon",
#' scenario = "ssp126",
#' mean = TRUE
#' )
#' }
htr_create_ensemble <- function(hpc = NA, # if ran in the HPC, possible values are "array", "parallel"
Expand Down
48 changes: 44 additions & 4 deletions R/htr_download_ESM.R
Original file line number Diff line number Diff line change
@@ -1,16 +1,56 @@
#' Download ESM data
#' Download Earth System Model (ESM) data using wget scripts
#'
#' This function downloads climate model data from remote repositories using wget
#' scripts. It processes multiple wget scripts in parallel to efficiently download
#' large climate datasets, typically from CMIP6 data nodes or similar repositories.
#'
#' @details
#' The function executes bash wget scripts that contain download commands for climate
#' data files. It changes the working directory to the output directory before running
#' each wget script to ensure files are downloaded to the correct location.
#'
#' The process involves:
#' 1. Finding all wget script files in the input directory
#' 2. For each script, changing to the output directory
#' 3. Executing the wget script with the `-s` flag (silent mode)
#' 4. Restoring the original working directory
#'
#' All wget scripts are processed in parallel using multiple workers for efficient
#' downloading of large datasets.
#'
#' @author Dave Schoeman and Tin Buenafe
#'
#' @inheritParams htr_slice_period
#' @param indir Character string. Directory containing wget script files. These are
#' typically bash scripts with wget commands for downloading climate data from
#' remote repositories (e.g., ESGF data nodes).
#' @param outdir Character string. Directory where the downloaded NetCDF files will
#' be saved. The function will change to this directory before executing wget scripts.
#'
#' @return
#' No return value. The function downloads NetCDF files to the specified output
#' directory as defined by the wget scripts.
#'
#' @note
#' - Requires `wget` to be installed and accessible from the system PATH
#' - Wget scripts should be properly formatted bash scripts with appropriate download commands
#' - The function temporarily changes working directory during execution
#' - Uses parallel processing with (number of CPU cores - 2) workers
#' - Ensure sufficient disk space is available for downloaded climate data
#' - Network connectivity and access permissions to data repositories are required
#'
#' @references
#' ESGF Data Portal: https://esgf-node.llnl.gov/projects/esgf-llnl/
#' CMIP6 Data Access: https://pcmdi.llnl.gov/CMIP6/
#'
#' @export
#'
#' @examples
#' \dontrun{
#' htr_download_ESM(
#' hpc = NA,
#' indir = file.path(base_dir, "data", "raw", "wget"), # input directory
#' outdir = file.path(base_dir, "data", "raw", "tos") # output directory
#' hpc = NA,
#' indir = file.path(base_dir, "data", "raw", "wget"), # input directory
#' outdir = file.path(base_dir, "data", "raw", "tos") # output directory
#' )
#' }
htr_download_ESM <- function(hpc = NA, # if ran in the HPC, possible values are "array", "parallel"
Expand Down
43 changes: 40 additions & 3 deletions R/htr_fix_calendar.R
Original file line number Diff line number Diff line change
@@ -1,19 +1,56 @@
#' Fix calendars (leap years)
#' Fix calendar systems by standardizing to 365-day calendar
#'
#' This function standardizes climate model data to use a consistent 365-day calendar
#' system by removing leap days (February 29th) and setting the calendar attribute.
#' This is essential for consistent temporal analysis across different climate models
#' that may use different calendar systems.
#'
#' @details
#' Climate models use various calendar systems (Gregorian, 365-day, 360-day, etc.),
#' which can cause issues when comparing or combining data from different models.
#' This function standardizes all data to a 365-day calendar using CDO operations.
#'
#' The function:
#' 1. Checks if files contain leap days by examining if the number of time steps is divisible by 365
#' 2. For daily data with leap days: Uses `cdo -setcalendar,365_day -delete,month=2,day=29` to remove February 29th
#' 3. For data without leap days: Uses `cdo setcalendar,365_day` to set the calendar attribute
#' 4. Replaces original files with the calendar-corrected versions
#'
#' The process creates temporary files during processing to avoid data corruption.
#'
#' @author Dave Schoeman and Tin Buenafe
#'
#' @inheritParams htr_slice_period
#' @param indir Character string. Directory containing NetCDF files that need calendar
#' standardization. Files should be climate model outputs with time dimensions.
#'
#' @return
#' No return value. The function modifies files in-place, replacing original files
#' with calendar-standardized versions. Progress messages are printed to the console
#' indicating which files had leap days removed.
#'
#' @note
#' - Requires CDO (Climate Data Operators) to be installed and accessible from the system PATH
#' - **WARNING**: This function modifies files in-place. Ensure you have backups of original data
#' - Only processes daily frequency data for leap day removal (detected by "_day_" in frequency)
#' - Uses parallel processing when `hpc` is not set to "array"
#' - Creates temporary files during processing which are automatically cleaned up
#' - Prints informative messages about which files are being processed
#'
#' @references
#' CDO User Guide: https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf
#' CDO setcalendar operator: https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf#page=142
#' CDO delete operator: https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf#page=60
#' CMIP6 calendar conventions: https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html
#'
#' @export
#'
#' @examples
#' \dontrun{
#'
#' htr_fix_calendar(
#' hpc = NA,
#' file = NA,
#' indir = file.path(base_dir, "data", "merged"), # input directory
#' indir = file.path(base_dir, "data", "merged") # input directory
#' )
#' }
htr_fix_calendar <- function(hpc = NA, # if ran in the HPC, possible values are "array", "parallel"
Expand Down
Loading
Loading