Merge pull request #257 from jmccreight/feat_docs_for_release

update docs for release
EC-USGS · Dec 15, 2023 · d2d286d · d2d286d
2 parents be79e76 + d736df5
commit d2d286d
Show file tree

Hide file tree

Showing 52 changed files with 2,092 additions and 1,537 deletions.
diff --git a/.gitignore b/.gitignore
@@ -56,6 +56,9 @@ autotest/codecov
 autotest/.coverage
 autotest/temp
 
+# test_data
+test_data/.test_data_version*
+
 # sphinx
 generated/
 

diff --git a/README.md b/README.md
@@ -20,53 +20,33 @@
 <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
 **Table of Contents**
 
-- [Purpose](#purpose)
+- [About](#about)
 - [Installation](#installation)
-- [Contributing](#contributing)
-- [Example Notebooks](#example-notebooks)
-- [Overview of Repository Contents](#overview-of-repository-contents)
+- [Getting started / Example notebooks](#getting-started--example-notebooks)
+- [Community engagement](#community-engagement)
 - [Disclaimer](#disclaimer)
 
 <!-- END doctoc generated TOC please keep comment here to allow auto update -->
 
-## Purpose
+## About
 
-The purpose of this repository is to refactor and redesign the [PRMS modeling
-system](https://www.usgs.gov/software/precipitation-runoff-modeling-system-prms)
-while maintaining its functionality. Code modernization is a step towards
-unification with [MODFLOW 6 (MF6)](https://github.com/MODFLOW-USGS/modflow6).
+Welcome to the pywatershed repository!
 
-The following motivations are taken from our [AGU poster from December
-2022](https://agu2022fallmeeting-agu.ipostersessions.com/default.aspx?s=05-E1-C6-40-DF-0D-4D-C7-4E-DE-D2-61-02-05-8F-0A)
-which provides additional details on motivations, project status, and current
-directions of this project as of approximately January 2023.
+Pywatershed is Python package for simulating hydrologic processes motivated by
+the need to modernize important, legacy hydrologic models at the USGS,
+particularly the
+[Precipitation-Runoff Modeling System](https://www.usgs.gov/software/precipitation-runoff-modeling-system-prms)
+(PRMS, Markstrom et al., 2015) and its role in
+[GSFLOW](https://www.usgs.gov/software/gsflow-coupled-groundwater-and-surface-water-flow-model>)
+(Markstrom et al., 2008).
+The goal of modernization is to make these legacy models more flexible as process
+representations, to support testing of alternative hydrologic process
+conceptualizations, and to facilitate the incorporation of cutting edge
+modeling techniques and data sources. Pywatershed is a place for experimentation
+with software design, process representation, and data fusion in the context
+of well-established hydrologic process modeling.
 
-Goals of the USGS Enterprise Capacity (EC) project include:
-
-  * A sustainable integrated, hydrologic modeling framework for the U.S.
-    Geological Survey (USGS)
-  * Interoperable modeling across the USGS, partner agencies, and academia
-
-Goals for EC Watershed Modeling:
-
-  * Couple the Precipitation-Runoff Modeling System (PRMS, e.g. Regan et al,
-	2018)  with MODFLOW 6 (MF6, e.g. Langevin et al, 2017) in a sustainable
-	way
-  * Redesign PRMS to be more modern and flexible
-  * Prioritize process representations in the current National Hydrological
-    Model (NHM) based on PRMS 5.2.1
-
-Prototype an EC watershed model: "pywatershed"
-
-  * Redesign PRMS quickly in python
-  * Couple to MF6 via BMI/XMI interface (Hughes et al, 2021; Hutton et al, 2020)
-  * Establish a prototyping ground for EC codes that couples to the compiled
-	framework: low cost proof of concepts (at the price of potentially less
-    computational performance) * Enable process representation hypothesis testing
-  * Use cutting-edge techniques and technologies to improve models 
-  * Machine learning, automatic differentiation 
-  * Address challenges of modeling across space and time scales 
-  * Transition prototype watershed model to compiled EC code
+For more information on the goals and status of pywatershed, please see the [pywatershed docs](https://pywatershed.readthedocs.io/).
 
 
 ## Installation
@@ -81,7 +61,7 @@ all platforms.
 The `pywatershed` package is [available on
 conda-forge](https://anaconda.org/conda-forge/pywatershed). The installation
 is the quickest way to get up and running by provides only the minimal set of
-dependencies (not including jupyter nor all packages needed for running the
+dependencies (not including Jupyter nor all packages needed for running the
 example notebooks, also not suitable for development purposes). 
 
 We recommend the following installation procedures to get fully-functional
@@ -92,7 +72,7 @@ repository before installing `pywatershed` itself. Mamba will be much faster
 than Ananconda (but the conda command could also be used). 
 
 If you wish to use the stable release, you will use `main` in place of 
-`<branch>` in the following commands. If you want to follow developemnt, you'll
+`<branch>` in the following commands. If you want to follow development, you'll
 use `develop` instead.
 
 Without using `git` (directly), you may:
@@ -120,68 +100,55 @@ you will also need to activate this environment by name.)
 
 
 We install the `environment_w_jupyter.yml` to provide all known dependencies 
-including those for running the eample notebooks. (The `environment.yml` 
-does not contain jupyter or jupyterlab because this interferes with installation
-on WholeTale, see Example Notebooks seection below.)
+including those for running the example notebooks. (The `environment.yml` 
+does not contain Jupyter or JupyterLab because this interferes with installation
+on WholeTale, see Getting Started section below.)
 
-## Contributing
 
-See the [developer documentation](./DEVELOPER.md) for instructions on setting up
-a development environment. See the [contribution guide](./CONTRIBUTING.md) to
-contribute to this project.
+## Getting started / Example notebooks
 
-## Example Notebooks
+Please note that you can browse the API reference, developer info, and index
+in the [pywatershed docs]((https://pywatershed.readthedocs.io/)). But
+*the best way to get started with pywatershed is to dive into the example
+notebooks*.
 
 For introductory example notebooks, look in the
 [`examples/`](https://github.com/EC-USGS/pywatershed/tree/main/examples>)
 directory in the repository. Numbered starting at 00, these are meant to be
-completed in order. Non-numbered notebooks coveradditional topics. These
-notebooks are note yet covered by testing and so may be expected to have some
-issues until they are added to testing. In `examples/developer/` there are
-notebooks of interest to developers who may want to learn about running the
-software tests.
-
-Though no notebook outputs are saved in Github, these notebooks can easily
-navigated to and run in WholeTale containers (free but sign-up or log-in
-required). This is a very easy and quick way to get started without needing to
-install pywatershed requirements yourself. WholeTale is an NSF funded project
-and supports logins from many institutions, e.g. the USGS, and you may not need
-to register.
-
-There are containers for both the `main` and `develop` branches.
+completed in order. Numbered starting at 00, these are meant to be completed
+in order. Notebook outputs are not saved in Github. But you can run these
+notebooks locally or using WholeTale (an NSF funded project supporting logins
+from many institutions, free but sign-up or log-in required)
+where the pywatershed environment is all ready to go:
 
 [![WholeTale](https://raw.githubusercontent.com/whole-tale/wt-design-docs/master/badges/wholetale-explore.svg)](https://dashboard.wholetale.org)
 
-  * [WholeTale container for latest release (main
-	branch)](https://dashboard.wholetale.org/run/64ae29e8a887f48b9f173678?tab=metadata)
-  * [WholeTale container for develop
-	branch](https://dashboard.wholetale.org/run/64ae25c3a887f48b9f1735c8?tab=metadata)
+  * [Run latest release in WholeTale](https://dashboard.wholetale.org/run/64ae29e8a887f48b9f173678?tab=metadata)
+  * [Run the develop branch in WholeTale](https://dashboard.wholetale.org/run/64ae25c3a887f48b9f1735c8?tab=metadata)
 
-WholeTale will give you a jupyter-lab running in the root of this
+WholeTale will give you a JupyterLab running in the root of this
 repository. You can navigate to `examples/` and then open and run the notebooks
 of your choice.  The develop container may require the user to update the
 repository (`git pull origin`) to stay current with development.
 
-## Overview of Repository Contents
+Non-numbered notebooks in `examples/` cover additional topics. These
+notebooks are not yet covered by testing and you may encounter some
+issues. In `examples/developer/` there are notebooks of interest to
+developers who may want to learn about running the software tests.
 
-The contents of directories at this level is described. Therein you may discover
-another README.md for more information.
 
-```
-.github/: Github actions, scripts and Python environments for continuous integration (CI) and releasing,
-asv_benchmarks/: preformance benchmarking by ASV
-autotest/: pywatershed package testing using pytest
-autotest_exs/: pywatershed example notebook testing using pytest
-bin/:PRMS executables distributed
-doc/:Package/code documentation source code
-evaluation/: tools for evaluation of pywatershed
-examples/:How to use the package, mostly jupyter notebooks
-prms_src/:PRMS source used for generating executables in bin/
-pywatershed/:Package source
-reference/:Ancillary materials for development
-resources/:Static stuff like images
-test_data/:Data used for automated testing
-```
+## Community engagement
+
+We value your feedback! Please use [discussions](https://github.com/EC-USGS/pywatershed/discussions)
+or [issues](https://github.com/EC-USGS/pywatershed/issues) on Github. 
+For more in-depth contributions, please start by reading over
+the pywatershed
+[DEVELOPER.md](https://github.com/EC-USGS/pywatershed/blob/develop/DEVELOPER.md) and
+[CONTRIBUTING.md](https://github.com/EC-USGS/pywatershed/blob/develop/CONTRIBUTING.md)
+guidelines.
+
+Thank you for your interest.
+
 
 ## Disclaimer
 

diff --git a/autotest/test_control.py b/autotest/test_control.py
@@ -51,7 +51,7 @@ def test_control_simple(control_simple):
     assert control_simple.time_step == ts
     assert control_simple.start_time == time_dict["start_time"]
     assert control_simple.end_time == time_dict["end_time"]
-    assert control_simple.current_time is None
+    assert control_simple.current_time == control_simple.init_time
     assert control_simple.itime_step == -1
     prev_time = control_simple.current_time
     n_times = control_simple.n_times

diff --git a/autotest/test_netcdf_output.py b/autotest/test_netcdf_output.py
@@ -32,6 +32,7 @@ def control(domain):
     control.edit_n_time_steps(n_time_steps)
     control.options["budget_type"] = "error"
     del control.options["netcdf_output_var_names"]
+    del control.options["netcdf_output_dir"]
     return control
 
 
@@ -40,6 +41,15 @@ def control(domain):
 # optional variables to budgets
 
 check_vars = {
+    "PRMSSolarGeometry": [
+        "soltab_horad_potsw",
+        "soltab_potsw",
+    ],
+    "PRMSAtmosphere": [
+        "tminf",
+        "potet",
+        "swrad",
+    ],
     "PRMSCanopy": [
         "hru_intcpstor",
         "hru_intcpstor_change",
@@ -65,7 +75,24 @@ def control(domain):
 def test_process_budgets(domain, control, params, tmp_path, budget_sum_param):
     tmp_dir = pl.Path(tmp_path)
     # print(tmp_dir)
-    model_procs = [pywatershed.PRMSCanopy, pywatershed.PRMSChannel]
+    model_procs = [
+        pywatershed.PRMSSolarGeometry,
+        pywatershed.PRMSAtmosphere,
+        pywatershed.PRMSCanopy,
+        pywatershed.PRMSChannel,
+    ]
+
+    # setup input_dir with symlinked prms inputs and outputs
+    domain_output_dir = domain["prms_output_dir"]
+    input_dir = tmp_path / "input"
+    input_dir.mkdir()
+    control.options["input_dir"] = input_dir
+
+    # Could limit this to just the variables in model_procs
+    for ff in domain_output_dir.resolve().glob("*.nc"):
+        shutil.copy(ff, input_dir / ff.name)
+    for ff in domain_output_dir.parent.resolve().glob("*.nc"):
+        shutil.copy(ff, input_dir / ff.name)
 
     # Deal with parameter around what budget sum vars to write and check
     if budget_sum_param == "some":
@@ -80,9 +107,7 @@ def test_process_budgets(domain, control, params, tmp_path, budget_sum_param):
     else:
         raise ValueError("upexpected value")
 
-    # dont need any PRMS inputs for the model specified, so this is sufficient
-    input_dir = domain["prms_output_dir"]
-    control.options["input_dir"] = input_dir
+    control.options["netcdf_output_dir"] = tmp_dir
 
     # TODO: Eliminate potet and other variables from being used
     model = Model(
@@ -91,27 +116,26 @@ def test_process_budgets(domain, control, params, tmp_path, budget_sum_param):
         parameters=params,
     )
 
+    # we are going to harvest the data from memory and store here
     check_dict = {proc: {} for proc in check_vars.keys()}
 
     # test outputting specific vars by only using check_vars
     output_vars = [
         item for sublist in list(check_vars.values()) for item in sublist
     ]
-    output_vars = None
 
-    with pytest.warns(UserWarning):
+    with pytest.raises(ValueError):
         model.initialize_netcdf(
-            tmp_dir,
+            pl.Path("foo"),
             budget_args=budget_args,
             output_vars=output_vars,
         )
 
-    with pytest.raises(RuntimeError):
-        model.initialize_netcdf(
-            tmp_dir,
-            budget_args=budget_args,
-            output_vars=output_vars,
-        )
+    model.initialize_netcdf(
+        output_dir=tmp_dir,  # should allow a matching argument to control
+        budget_args=budget_args,
+        output_vars=output_vars,
+    )
 
     for tt in range(n_time_steps):
         model.advance()
@@ -122,11 +146,24 @@ def test_process_budgets(domain, control, params, tmp_path, budget_sum_param):
             for vv in pp_vars:
                 if tt == 0:
                     # use the output data to figure out the shape
-                    check_dict[pp][vv] = np.zeros(
-                        (n_time_steps, model.processes[pp][vv].shape[0])
-                    )
+                    if isinstance(
+                        model.processes[pp][vv], pywatershed.TimeseriesArray
+                    ):
+                        spatial_len = model.processes[pp][vv].data.shape[1]
+                    else:
+                        spatial_len = model.processes[pp][vv].shape[0]
 
-                check_dict[pp][vv][tt, :] = model.processes[pp][vv]
+                    check_dict[pp][vv] = np.zeros((n_time_steps, spatial_len))
+
+                if isinstance(
+                    model.processes[pp][vv], pywatershed.TimeseriesArray
+                ):
+                    check_dict[pp][vv][tt, :] = model.processes[pp][vv].current
+                else:
+                    check_dict[pp][vv][tt, :] = model.processes[pp][vv]
+
+            if pp in ["PRMSSolarGeometry", "PRMSAtmosphere"]:
+                continue
 
             for bb in check_budget_sum_vars:
                 if tt == 0:
@@ -148,7 +185,15 @@ def test_process_budgets(domain, control, params, tmp_path, budget_sum_param):
     for pp, pp_vars in check_vars.items():
         for vv in pp_vars:
             nc_data = xr.open_dataset(tmp_dir / f"{vv}.nc")[vv]
-            assert np.allclose(check_dict[pp][vv], nc_data)
+            if vv in pywatershed.PRMSSolarGeometry.get_variables():
+                assert np.allclose(
+                    check_dict[pp][vv], nc_data[0:n_time_steps, :]
+                )
+            else:
+                assert np.allclose(check_dict[pp][vv], nc_data)
+
+        if pp in ["PRMSSolarGeometry", "PRMSAtmosphere"]:
+            continue
 
         for bb in check_budget_sum_vars:
             nc_data = xr.open_dataset(tmp_dir / f"{pp}_budget.nc")[bb]
@@ -194,14 +239,12 @@ def test_separate_together_var_list(
     ]
 
     # setup input_dir with symlinked prms inputs and outputs
-    test_output_dir = tmp_dir / "test_results"
     domain_output_dir = domain["prms_output_dir"]
     input_dir = tmp_path / "input"
     input_dir.mkdir()
     control.options["input_dir"] = input_dir
     control.options["netcdf_output_var_names"] = output_vars
     control.options["netcdf_output_separate_files"] = separate
-    del control.options["netcdf_output_dir"]
 
     # Could limit this to just the variables in model_procs
     for ff in domain_output_dir.resolve().glob("*.nc"):
@@ -218,6 +261,7 @@ def test_separate_together_var_list(
         # passing no output_dir arg and none in opts throws an error
         model.initialize_netcdf()
 
+    test_output_dir = tmp_dir / "test_results"
     control.options["netcdf_output_dir"] = test_output_dir
     model = Model(
         model_procs,
@@ -257,7 +301,10 @@ def test_separate_together_var_list(
                 assert nc_file.exists()
 
                 ds = xr.open_dataset(nc_file, decode_timedelta=False)
-                proc_vars = set(proc.get_variables())
+                if output_vars is None:
+                    proc_vars = set(proc.get_variables())
+                else:
+                    proc_vars = set(check_vars[proc_key])
                 nc_vars = set(ds.data_vars)
                 assert proc_vars == nc_vars
                 for vv in proc.variables: