Merge pull request #253 from jmccreight/doc_test_data_autotest

update test_data and autotest READMEs
EC-USGS · Nov 17, 2023 · c0b256f · c0b256f
2 parents a3ea9a1 + 4345f5f
commit c0b256f
Show file tree

Hide file tree

Showing 7 changed files with 485 additions and 130 deletions.
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -188,11 +188,10 @@ jobs:
           pip list
 
       - name: hru_1 - generate and manage test data domain, run PRMS and convert csv output to NetCDF
-        working-directory: test_data/generate
+        working-directory: autotest
         run: |
-          pytest -vv -n=2 --durations=0 run_prms_domains.py --domain=hru_1
-          pytest -vv -n=auto --durations=0 convert_prms_output_to_nc.py  --domain=hru_1
-          pytest -vv -n=auto --durations=0 remove_prms_csvs.py
+          python generate_test_data.py \
+            -n=auto --domain=hru_1 --remove_prms_csvs --remove_prms_output_dirs
 
       - name: hru_1 - list netcdf input files
         working-directory: test_data
@@ -212,12 +211,10 @@ jobs:
 
 
       - name: drb_2yr - generate and manage test data
-        working-directory: test_data/generate
+        working-directory: autotest
         run: |
-          pytest -vv remove_output_dirs.py --domain=hru_1
-          pytest -vv -n=2 run_prms_domains.py --domain=drb_2yr
-          pytest -vv -n=auto convert_prms_output_to_nc.py --domain=drb_2yr
-          pytest -vv -n=auto remove_prms_csvs.py
+          python generate_test_data.py \
+            -n=auto --domain=drb_2yr --remove_prms_csvs --remove_prms_output_dirs
 
       - name: drb_2yr - list netcdf input files
         working-directory: test_data
@@ -236,12 +233,10 @@ jobs:
           --junitxml=pytest_drb_2yr.xml
 
       - name: ucb_2yr - generate and manage test data
-        working-directory: test_data/generate
+        working-directory: autotest
         run: |
-          pytest -vv remove_output_dirs.py --domain=drb_2yr
-          pytest -vv -n=2 run_prms_domains.py --domain=ucb_2yr
-          pytest -vv -n=auto convert_prms_output_to_nc.py --domain=ucb_2yr
-          pytest -vv -n=auto remove_prms_csvs.py
+          python generate_test_data.py \
+            -n=auto --domain=ucb_2yr --remove_prms_csvs --remove_prms_output_dirs
 
       - name: ucb_2yr - list netcdf input files
         working-directory: test_data

diff --git a/DEVELOPER.md b/DEVELOPER.md
@@ -141,28 +141,34 @@ all formally encoded in `.github/workflows/ci.yaml` and
 
 ## Testing
 Once the dependencies are available, we want to verify the software by running
-its test suite. The following testing procedures are also covered in the
-notebook `examples/01_automated_testing.ipynb`.  To run the tests, we first need
-to generate the test data. This consists of running PRMS and then converting the
-output to netcdf. From `test_data/scripts`:
+its test suite. However, we first need to generate the test data. This consists
+of running binaries (PRMS) and then converting the output to netcdf files used
+by autotest as the answers or reference results. In the `autotest/` directory,
+test data is generated using the following command.
 
+```shell
+python generate_test_data.py -n=auto
 ```
-pytest -v -n=4 test_run_domains.py
-pytest -v -n=8 test_nc_domains.py
-```
+
+Additional options may be supplied to `generate_test_data.py`. For more details
+on generating the test data, see [`test_data/README.md`](test_data/README.md).
 
 Finally, the tests can be run from the `autotest` directory:
 
-``` pytest -v -n=8 ```
+```shell
+pytest -v -n=auto
+```
 
 All tests should pass, XPASS, or XFAIL. XFAIL is an expected
-failure. Substitute `-n auto` to automatically use all available cores on your
+failure. The flag `-n auto` to automatically use all available cores on your
 machine.
 
+For more details on the autotests, see [`autotest/README.md`](autotest/README.md).
+
 
 ## Linting
 Automated linting procedures are performed in CI and enforced, these are
-```
+```shell
 isort ./autotest ./pywatershed
 black ./autotest ./pywatershed
 flake8 --count --show-source --exit-zero ./pywatershed ./autotest

diff --git a/autotest/README.md b/autotest/README.md
@@ -1,82 +1,126 @@
-# Autotest
-
-## Usage
-
-```
-cd autotest
-pytest
-```
-Pytest options can be explored via `pytest --help`.
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**
 
+- [Autotest](#autotest)
+  - [Test data](#test-data)
+  - [Usage](#usage)
+  - [domain_yaml details](#domain_yaml-details)
+    - [Path resolution](#path-resolution)
+    - [Answers for domain tests](#answers-for-domain-tests)
 
-## Developer
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
 
-This is how the pywatershed package tests itself.
+# Autotest
 
-The test suite consists both
-	* stand alone tests, and
-	* domain tests
+Autotest provides the test suite for pywatershed. The tests include
+both
+  * stand alone tests, and
+  * domain tests.
 
 Stand alone tests do not require any input files or output files besides what
 is supplied in the testing framework. These tests generally run quickly
 (e.g. testing basic logic of a class or type requirements or results).
 
 Domain tests require input or output files for an NHM domain. These tests
 scale with the size or number of HRUs in a domain. The test suite takes
-arguments related to domain tests:
+arguments related to domain tests which are described in sthe usage section below.
+
+## Test data
+The majority of tests are domain tests (as of writing this). And the vast majority of the
+domain test data must be generated by running binaries included in the repository BEFORE
+running the autotests. 
+
+To generate the test data, run the following command from `autotest/`:
+
+```shell
+python generate_test_data.py -n=auto
+```
+
+This command should generally be sufficient but more options are available, including
+passing of options to pytest. Run `python generate_test_data.py --help` for more details.
+For example, if you are in a hurry to run and test a single domain you may restrict
+(or expand) the domains processed with the `--domains` option. To get verbose output
+from pytest, pass `-vv`. 
 
+Using the above command allows the autotests to provide checks that the test data in
+`test_data/` have been generated and have been generated by the current version of
+pywatershed. These checks are meant to avoid gross errors with autotesting but will
+not cover ever possible testing situation. When in doubt about tests, it's always best
+practice to start over by re-generating the test data.
+
+There are temporary situations where errors caused by checking for test data are
+unwarranted and in those cases, disabling the errors by editing `autotest/conftest.py`
+is the temporary solution.
+
+Please see [`test_data/README.md`](../test_data/README.md) for additional details on how
+to generate the test data.
+
+## Usage
+
+```
+cd autotest
+pytest -n=auto -vv
 ```
- --domain_yaml=DOMAIN_YAML
-                        YAML file(s) for indiv domain tests. You can pass multiples of this argument. Default
-                        value (not shown here) is --domain_yaml=../test_data/drb_2yr/drb_2yr.yaml.
+
+Pytest options can be explored via `pytest --help`. Custom options for
+pywatershed are buried down in the output of this call for help, these are:
+
+```
+Custom options:
+  --domain_yaml=DOMAIN_YAML
+                        YAML file(s) for indiv domain tests. You can pass multiples of this
+                        argument. Default value (not shown here) is
+                        --domain_yaml=../test_data/drb_2yr/drb_2yr.yaml
   --print_ans           Print results and assert False for all domain tests
+  --all_domains         Run all test domains
 ```
 
-Note that the `--domain_yaml` argument may be repeated within a single call to test multiple
-domains in a single pytest.
+The default domain tested is `drb_2yr`. All domains present in `test_data/`
+can be tested using `--all_domains`. Requesting a specific domain or multiple
+domains is done by passing one or more `--domain_yaml` arguments. 
 
-The `domain_yaml` file is an *evolving* set of data that includes static domain
-inputs (e.g. CBH forcing files, parameter files), static or reference model
-output (from PRMS/NHM), and the answers to domain tests.
+## domain_yaml details
 
-Examples of `domain_yaml` files can be found in, for example, in
-`pywatershed/test_data/drb_2yr/drb_2yr.yaml`
-and
-`pywatershed/test_data/conus_2yr/conus_2yr.yaml`.
+This section is about creating or working with the domain_yaml file when
+writing tests.
 
+The `domain_yaml` file provides information to the autotests. The domain_yaml
+files for a domain directory `somewhere/` will be `somewhere/somewhere.yaml`. 
+This YAML file includes paths to static domain inputs (e.g. CBH forcing files,
+parameter files), paths to static or reference model output (from PRMS/NHM),
+and the answers to domain tests. Examples of `domain_yaml` files can be found
+in, each domain.
 
-### Domain inputs
-These are specific files for a specific model domain (e.g. the Delaware River Basin,
-or the CONUS NHM, etc). The files are paths and are to be specified as either
-  * relative paths: relative to the location of the yaml file
-    in which the path is appearing
-  * absolute paths: use absolute paths (for some machine and potentially user)
-The examples listed above demonstrate use of both relative and absolute paths.
+Some details on the contents of the YAML file are given below.
+
+### Path resolution
+The test configuration for autotest (`autotest/conftest.py`) provides special
+path resolution relative to the `domain_yaml` file for paths specified in a
+list in that file:
+
+```  
+    ["param_file", "control_file", "cbh_nc", "prms_run_dir", "prms_output_dir",]
+```
 
-Domain inputs are generally created by scripts run in `test_data/scripts`. The
-`drb_2yr` case shows the relationship between the prms binary, the input files,
-and the output files. These scripts help maintain sanity when generating new
-files for domain tests.
+Additional fields can be added to this list to provide path resolution for new
+fields in the YAML file. 
 
 
 ### Answers for domain tests
-Tests have objectively correct answers. The answer key is stored in the yaml in
-top-level attribute/key called `test_ans`. This can be seen in the examples
-listed above. This attibute has a sub-attributes for each file with domain
-tests in `autotest/test_*py` (the text `test_` and `.py` is dropped in the
-yaml reference). Below this, each file has named tests attributes which in turn
-potentially have named cases/keys when iterated with other fixtures (such as
-variables or types).
-
-The tests answers for a given test are output of some kind of summary statistic
-or other kind of reduction performed on data in memory. The answers vary with
-domain and are most easily collected by running the tests themselves while
-verifying accuracy manually when putting the answers into the domain yaml file.
-
-For convenience, a user can select to print the answers at run/test time using
-the `--print_ans` argument when running ptest. This prints the answer for
-domain tests and always asserts False after doing so for each test in a file.
-This option aids in conctruction of the tests_ans section of the domain
-yaml file for new domains or when extending tests on existing domains. Running
-`pytest -s ...` prints output in a format that can be copied to the yaml and
-edited quickly to get new test results enshrined in the answer key.
+Certain tests have answers stored in the domain YAML. This "answer key" is
+stored in the top-level key `test_ans`. This can be seen in the examples
+listed above. Generally, for a autotest `test_x.py` the key "x" will be
+provided below `test_ans` and below it will be what ever data is used in
+that test.
+
+These tests answers for a given test are typically a summary statistic
+performed on model data in memory. The answers vary with domain and
+are most easily collected by running the tests themselves while
+verifying accuracy of the test through expert judgement. The answers
+enshrined in the domain YAML indicate when test results change. Because
+test results change from time to time, a convenience utility function,
+`assert_or_print` is provided in `utils.py`. Using this function allows
+the option `--print_ans -s` (note that `-s` prints output to the terminal)
+to be passed at run time to print all the new values that should be
+updated into the domain YAML.
diff --git a/autotest/conftest.py b/autotest/conftest.py
@@ -1,4 +1,4 @@
-import pathlib
+import pathlib as pl
 
 import yaml
 
@@ -34,6 +34,49 @@ def pytest_addoption(parser):
 
 
 def pytest_generate_tests(metafunc):
+    # check domain test data regardless of what fixtures are being requested
+    if metafunc.config.getoption("all_domains"):
+        domain_file_list = [
+            "../test_data/hru_1/hru_1.yaml",
+            "../test_data/drb_2yr/drb_2yr.yaml",
+            "../test_data/ucb_2yr/ucb_2yr.yaml",
+        ]
+    else:
+        domain_file_list = metafunc.config.getoption("domain_yaml")
+        if len(domain_file_list) == 0:
+            domain_file_list = ["../test_data/drb_2yr/drb_2yr.yaml"]
+
+    # check that the test_data has been generated and is up-to-date
+    # for each domain
+    for dd in domain_file_list:
+        domain_name = pl.Path(dd).parent.name
+        test_data_version_file = pl.Path(
+            f"../test_data/.test_data_version_{domain_name}.txt"
+        )
+        if not test_data_version_file.exists():
+            msg = (
+                f"Test data for domain {domain_name} do not appear to have\n"
+                "been generated. Please see DEVELOPER.md for information on\n"
+                "generating test data.\n"
+            )
+            raise ValueError(msg)
+
+        repo_version_file = pl.Path("../version.txt")
+
+        with open(test_data_version_file) as ff:
+            test_version = ff.read()
+        with open(repo_version_file) as ff:
+            repo_version = ff.read()
+
+        if test_version != repo_version:
+            msg = (
+                f"Test data for domain {domain_name} do not appear to\n"
+                "have been generated by the current version of the\n"
+                "pywatershed repository. Please see DEVELOPER.md for\n"
+                "information on generating test data.\n"
+            )
+            raise ValueError(msg)
+
     if "domain" in metafunc.fixturenames:
         # Put --print_ans in the domain fixture as it applies only to the
         # domain tests. It is a run time attribute, not actually an attribute
@@ -42,22 +85,11 @@ def pytest_generate_tests(metafunc):
         # Not sure I love this, maybe have a domain_opts fixture later?
         print_ans = metafunc.config.getoption("print_ans")
 
-        if metafunc.config.getoption("all_domains"):
-            domain_file_list = [
-                "../test_data/hru_1/hru_1.yaml",
-                "../test_data/drb_2yr/drb_2yr.yaml",
-                "../test_data/ucb_2yr/ucb_2yr.yaml",
-            ]
-        else:
-            domain_file_list = metafunc.config.getoption("domain_yaml")
-            if len(domain_file_list) == 0:
-                domain_file_list = ["../test_data/drb_2yr/drb_2yr.yaml"]
-
         # open and read in the yaml and
-        domain_ids = [pathlib.Path(ff).stem for ff in domain_file_list]
+        domain_ids = [pl.Path(ff).stem for ff in domain_file_list]
         domain_list = []
         for dd in domain_file_list:
-            dd_file = pathlib.Path(dd)
+            dd_file = pl.Path(dd)
             with dd_file.open("r") as yaml_file:
                 domain_dict = yaml.safe_load(yaml_file)
 
@@ -78,15 +110,15 @@ def pytest_generate_tests(metafunc):
                 "prms_run_dir",
                 "prms_output_dir",
             ]:
-                domain_dict[ff] = pathlib.Path(domain_dict[ff])
+                domain_dict[ff] = pl.Path(domain_dict[ff])
                 if not domain_dict[ff].is_absolute():
                     domain_dict[ff] = domain_dict["dir"] / domain_dict[ff]
 
             for fd_key in ["cbh_inputs"]:
                 domain_dict[fd_key] = {
                     key: (
-                        pathlib.Path(val)
-                        if pathlib.Path(val).is_absolute()
+                        pl.Path(val)
+                        if pl.Path(val).is_absolute()
                         else domain_dict["dir"] / val
                     )
                     for key, val in domain_dict[fd_key].items()