-
Notifications
You must be signed in to change notification settings - Fork 50
Forcings Ingestion
In V3 mode, forcing input starts with running build_forcing_sets
from nhd_network_utilities_v02
to obtain the sets of forcing files for each loop (run_Sets
). The build_forcing_sets
routine constructs sets of forcing files based on user-specified parameters, starting by retrieving qlat_forcing_sets
, qlat_input_folder
, nts
(number of time steps), max_loop_size
, and dt
(time step interval) from the forcing_parameters
. It verifies the existence of the qlat_input_folder
and raises errors if the folder is not specified or doesn't exist.
If run_sets
are provided, it loops through them and appends a final_timestamp to each set by extracting the model_output_valid_time from the last file in the set using the nhd_io.get_param_str()
function.
If no run_sets are provided, it constructs a new set by first determining the time interval (dt_qlat
) between the forcing files by comparing timestamps from the first two files in the folder. This is done using the get_param_str
function, which retrieves the model_output_valid_time
from the first and second files. The time interval is used to compute qts_subdivisions
, which represents how many subdivisions of the forcing data correspond to each time step (dt
), ensuring that the interval is divisible by dt
. Next, the total number of files required (nfiles
) is calculated based on the number of time steps (nts
). The function generates a list of datetime_list_str
corresponding to the file timestamps and constructs file names with the pattern YYYYMMDDHHMM.CHRTOUT_DOMAIN1. It checks for the existence of each forcing file in the folder, raising an error if any file is missing.
Further in case no run_sets were provided, run_sets
is built by grouping the forcing files into sets, with each set containing up to max_loop_size
files. It calculates the number of time steps for each set (nts
) and accumulates them until the total time steps are reached. For each set, it extracts the final_timestamp
from the last file in the group and returns the list of constructed run_sets.
The run_sets
from the previous step are further processed by build_qlateral_array
from nhd_network_utilities_v02
, which is called from nwm_forcing_preprocess
. In addition to forcing_parameters
, segment_index
is another key input, which is the set of segment IDs used to filter the resulting dataframe. From forcing_parameters
, the following parameters are extracted:
-
qts_subdivisions
: Number of subdivisions for time steps (defaults to 1). -
nts
: Total number of time steps (defaults to 1). -
qlat_input_folder
: Folder containing qlateral files. -
qlat_input_file
: Direct input file for qlateral data.
The subsequent assembly of the forcings dataframe depends on whether forcing files are sourced from a Qlat input folder, a single Qlat input file, or a Qlat constant value:
- Qlat folder input:
The function checks for qlat_files
or constructs a list of files matching the qlat_file_pattern_filter
, and then reads additional file format information like column names (qlat_file_index_col
, qlat_file_value_col
, gw_bucket_col
, terrain_ro_col
). The CHRTOUT files are read in parallel processing, whereas each qlat file is read in by one CPU using get_ql_from_chrtout
from nhd_io
, which is based on netCDF4 import of the CHRTOUT files. For each file, get_ql_from_chrtout
is used to extract the relevant qlateral data (q_lateral
, gw_bucket
, terrain_runoff
columns) and package it into a list ql_list
.
The dataframe is built starting with extracting the feature index (idx
) from the first CHRTOUT file, followed by stacking the lateral inflow data from all files into a 2D array and converting it into a pandas dataframe qlat_df
. The rows represent segments (indexed by idx
), and the columns represent different time steps (based on the number of qlateral files). qlat_df
is then filtered to only include rows (segments) that are present in segment_index
.
- Qlat file input:
In this case, the format has to be csv, which is read in using get_ql_from_csv
from nhd_io
, using the pandas csv import function.
- Qlat constant value:
This option is the default if neither a folder qlat_input_folder
nor an input file qlat_input_file
is provided. In that case, the function creates a constant qlateral dataframe (qlat_const
), where all lateral inflows are set to a constant value (default: 0). The dataframe is created with time steps (nts
// qts_subdivisions
) and segment IDs (segment_index
).
In V4 mode, forcing sets are built within the AbstractNetwork
class, after its initialization either through HYFeaturesNetwork
, or NHDNetwork
. The member function to build run_sets
analogous to V3 is build_forcing_sets
:
-
Parameter Extraction: The function
build_forcing_sets
extracts the following parameters from the configuration dictionaryforcing_parameters
:-
qlat_forcing_sets
: A pre-built set of forcing runs, if provided. -
qlat_input_folder
: The folder containing the qlateral forcing files. -
nts
: The total number of time steps in the simulation. -
max_loop_size
: The maximum number of time steps or files that can be processed in one loop (default is 12). -
dt
: The time step interval for the model. The function then verifies that theqlat_input_folder
exists. If the folder does not exist or is not specified, the function halts.
-
-
Nexus File Conversion (if applicable): If the forcing files are of the type nex-*, and a binary folder is specified, the function converts these files to Parquet files using the helper function
nex_files_to_binary
. It updatesforcing_parameters
with the new folder with the Parquet files and file patterns after conversion. The conversion is from nex-csv files into binary Parquet files, which is conducted using another helper functionrewrite_to_parquet
based on pyarrow, which is called from withinnex_files_to_binary
. -
Assembly of the run sets: Depending on the input forcing configuration, the run_sets are built in one of the following three ways:
-
forcing_glob_filter is nex-: the function retrieves all files from the
qlat_input_folder
that match the nex- pattern. It reads the timestamp from the last row of the first file to determine thefinal_timestamp
for the run and stores the list of all qlateral files along with the total number of time steps (nts
) and thefinal_timestamp
in a single run set. -
Forcing Sets Predefined (
run_sets
):run_sets
are returned as is -
qlat_input_folder
is provided (and no forcing sets): a sorted list of forcing files is extracted from the input folder based on theforcing_glob_filter
(e.g.,*.CHRTOUT_DOMAIN1
or*NEXOUT
), followed by the determination of the time step interval from the first two files to compute the time step between forcing files (dt_qlat
). The number of subdivisions per time step (qts_subdivisions
) is then calculated as the ratio of the qlateral forcing time interval (dt_qlat
) to the model time step (dt
), and the number of files needed to cover the full duration of the simulation (nfiles
) is determined based on the total number of time steps(nts
) and the subdivisions. The list of forcing files is built from the resulting datetime list, and the existence of all forcing files is verified. The run_sets are finally built in sets with each containing up tomax_loop_size
files. For each run set, the number of time steps (nts
) is computed as the product of the number of files andqts_subdivisions
. The timestamp for the last file in each set (final_timestamp
) is extracted from the file’s metadata. The function loops through the forcing files in groups (up tomax_loop_size
at a time) and accumulates the total number of time steps processed, adding to therun_sets
list until all the required files are processed.
-
- Overview
- Hydrofabric Integration
- Input Forcing
- Domain Data
- Data Formats
- CLI
- BMI Tutorial
- Lower Colorado, TX example
- Larger Domains (e.g. CONUS)