Skip to content

Commit

Permalink
Merge pull request #253 from jmccreight/doc_test_data_autotest
Browse files Browse the repository at this point in the history
update test_data and autotest READMEs
  • Loading branch information
jmccreight authored Nov 17, 2023
2 parents a3ea9a1 + 4345f5f commit c0b256f
Show file tree
Hide file tree
Showing 7 changed files with 485 additions and 130 deletions.
23 changes: 9 additions & 14 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -188,11 +188,10 @@ jobs:
pip list
- name: hru_1 - generate and manage test data domain, run PRMS and convert csv output to NetCDF
working-directory: test_data/generate
working-directory: autotest
run: |
pytest -vv -n=2 --durations=0 run_prms_domains.py --domain=hru_1
pytest -vv -n=auto --durations=0 convert_prms_output_to_nc.py --domain=hru_1
pytest -vv -n=auto --durations=0 remove_prms_csvs.py
python generate_test_data.py \
-n=auto --domain=hru_1 --remove_prms_csvs --remove_prms_output_dirs
- name: hru_1 - list netcdf input files
working-directory: test_data
Expand All @@ -212,12 +211,10 @@ jobs:


- name: drb_2yr - generate and manage test data
working-directory: test_data/generate
working-directory: autotest
run: |
pytest -vv remove_output_dirs.py --domain=hru_1
pytest -vv -n=2 run_prms_domains.py --domain=drb_2yr
pytest -vv -n=auto convert_prms_output_to_nc.py --domain=drb_2yr
pytest -vv -n=auto remove_prms_csvs.py
python generate_test_data.py \
-n=auto --domain=drb_2yr --remove_prms_csvs --remove_prms_output_dirs
- name: drb_2yr - list netcdf input files
working-directory: test_data
Expand All @@ -236,12 +233,10 @@ jobs:
--junitxml=pytest_drb_2yr.xml

- name: ucb_2yr - generate and manage test data
working-directory: test_data/generate
working-directory: autotest
run: |
pytest -vv remove_output_dirs.py --domain=drb_2yr
pytest -vv -n=2 run_prms_domains.py --domain=ucb_2yr
pytest -vv -n=auto convert_prms_output_to_nc.py --domain=ucb_2yr
pytest -vv -n=auto remove_prms_csvs.py
python generate_test_data.py \
-n=auto --domain=ucb_2yr --remove_prms_csvs --remove_prms_output_dirs
- name: ucb_2yr - list netcdf input files
working-directory: test_data
Expand Down
26 changes: 16 additions & 10 deletions DEVELOPER.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,28 +141,34 @@ all formally encoded in `.github/workflows/ci.yaml` and

## Testing
Once the dependencies are available, we want to verify the software by running
its test suite. The following testing procedures are also covered in the
notebook `examples/01_automated_testing.ipynb`. To run the tests, we first need
to generate the test data. This consists of running PRMS and then converting the
output to netcdf. From `test_data/scripts`:
its test suite. However, we first need to generate the test data. This consists
of running binaries (PRMS) and then converting the output to netcdf files used
by autotest as the answers or reference results. In the `autotest/` directory,
test data is generated using the following command.

```shell
python generate_test_data.py -n=auto
```
pytest -v -n=4 test_run_domains.py
pytest -v -n=8 test_nc_domains.py
```

Additional options may be supplied to `generate_test_data.py`. For more details
on generating the test data, see [`test_data/README.md`](test_data/README.md).

Finally, the tests can be run from the `autotest` directory:

``` pytest -v -n=8 ```
```shell
pytest -v -n=auto
```

All tests should pass, XPASS, or XFAIL. XFAIL is an expected
failure. Substitute `-n auto` to automatically use all available cores on your
failure. The flag `-n auto` to automatically use all available cores on your
machine.

For more details on the autotests, see [`autotest/README.md`](autotest/README.md).


## Linting
Automated linting procedures are performed in CI and enforced, these are
```
```shell
isort ./autotest ./pywatershed
black ./autotest ./pywatershed
flake8 --count --show-source --exit-zero ./pywatershed ./autotest
Expand Down
160 changes: 102 additions & 58 deletions autotest/README.md
Original file line number Diff line number Diff line change
@@ -1,82 +1,126 @@
# Autotest

## Usage

```
cd autotest
pytest
```
Pytest options can be explored via `pytest --help`.
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
**Table of Contents**

- [Autotest](#autotest)
- [Test data](#test-data)
- [Usage](#usage)
- [domain_yaml details](#domain_yaml-details)
- [Path resolution](#path-resolution)
- [Answers for domain tests](#answers-for-domain-tests)

## Developer
<!-- END doctoc generated TOC please keep comment here to allow auto update -->

This is how the pywatershed package tests itself.
# Autotest

The test suite consists both
* stand alone tests, and
* domain tests
Autotest provides the test suite for pywatershed. The tests include
both
* stand alone tests, and
* domain tests.

Stand alone tests do not require any input files or output files besides what
is supplied in the testing framework. These tests generally run quickly
(e.g. testing basic logic of a class or type requirements or results).

Domain tests require input or output files for an NHM domain. These tests
scale with the size or number of HRUs in a domain. The test suite takes
arguments related to domain tests:
arguments related to domain tests which are described in sthe usage section below.

## Test data
The majority of tests are domain tests (as of writing this). And the vast majority of the
domain test data must be generated by running binaries included in the repository BEFORE
running the autotests.

To generate the test data, run the following command from `autotest/`:

```shell
python generate_test_data.py -n=auto
```

This command should generally be sufficient but more options are available, including
passing of options to pytest. Run `python generate_test_data.py --help` for more details.
For example, if you are in a hurry to run and test a single domain you may restrict
(or expand) the domains processed with the `--domains` option. To get verbose output
from pytest, pass `-vv`.

Using the above command allows the autotests to provide checks that the test data in
`test_data/` have been generated and have been generated by the current version of
pywatershed. These checks are meant to avoid gross errors with autotesting but will
not cover ever possible testing situation. When in doubt about tests, it's always best
practice to start over by re-generating the test data.

There are temporary situations where errors caused by checking for test data are
unwarranted and in those cases, disabling the errors by editing `autotest/conftest.py`
is the temporary solution.

Please see [`test_data/README.md`](../test_data/README.md) for additional details on how
to generate the test data.

## Usage

```
cd autotest
pytest -n=auto -vv
```
--domain_yaml=DOMAIN_YAML
YAML file(s) for indiv domain tests. You can pass multiples of this argument. Default
value (not shown here) is --domain_yaml=../test_data/drb_2yr/drb_2yr.yaml.

Pytest options can be explored via `pytest --help`. Custom options for
pywatershed are buried down in the output of this call for help, these are:

```
Custom options:
--domain_yaml=DOMAIN_YAML
YAML file(s) for indiv domain tests. You can pass multiples of this
argument. Default value (not shown here) is
--domain_yaml=../test_data/drb_2yr/drb_2yr.yaml
--print_ans Print results and assert False for all domain tests
--all_domains Run all test domains
```

Note that the `--domain_yaml` argument may be repeated within a single call to test multiple
domains in a single pytest.
The default domain tested is `drb_2yr`. All domains present in `test_data/`
can be tested using `--all_domains`. Requesting a specific domain or multiple
domains is done by passing one or more `--domain_yaml` arguments.

The `domain_yaml` file is an *evolving* set of data that includes static domain
inputs (e.g. CBH forcing files, parameter files), static or reference model
output (from PRMS/NHM), and the answers to domain tests.
## domain_yaml details

Examples of `domain_yaml` files can be found in, for example, in
`pywatershed/test_data/drb_2yr/drb_2yr.yaml`
and
`pywatershed/test_data/conus_2yr/conus_2yr.yaml`.
This section is about creating or working with the domain_yaml file when
writing tests.

The `domain_yaml` file provides information to the autotests. The domain_yaml
files for a domain directory `somewhere/` will be `somewhere/somewhere.yaml`.
This YAML file includes paths to static domain inputs (e.g. CBH forcing files,
parameter files), paths to static or reference model output (from PRMS/NHM),
and the answers to domain tests. Examples of `domain_yaml` files can be found
in, each domain.

### Domain inputs
These are specific files for a specific model domain (e.g. the Delaware River Basin,
or the CONUS NHM, etc). The files are paths and are to be specified as either
* relative paths: relative to the location of the yaml file
in which the path is appearing
* absolute paths: use absolute paths (for some machine and potentially user)
The examples listed above demonstrate use of both relative and absolute paths.
Some details on the contents of the YAML file are given below.

### Path resolution
The test configuration for autotest (`autotest/conftest.py`) provides special
path resolution relative to the `domain_yaml` file for paths specified in a
list in that file:

```
["param_file", "control_file", "cbh_nc", "prms_run_dir", "prms_output_dir",]
```

Domain inputs are generally created by scripts run in `test_data/scripts`. The
`drb_2yr` case shows the relationship between the prms binary, the input files,
and the output files. These scripts help maintain sanity when generating new
files for domain tests.
Additional fields can be added to this list to provide path resolution for new
fields in the YAML file.


### Answers for domain tests
Tests have objectively correct answers. The answer key is stored in the yaml in
top-level attribute/key called `test_ans`. This can be seen in the examples
listed above. This attibute has a sub-attributes for each file with domain
tests in `autotest/test_*py` (the text `test_` and `.py` is dropped in the
yaml reference). Below this, each file has named tests attributes which in turn
potentially have named cases/keys when iterated with other fixtures (such as
variables or types).

The tests answers for a given test are output of some kind of summary statistic
or other kind of reduction performed on data in memory. The answers vary with
domain and are most easily collected by running the tests themselves while
verifying accuracy manually when putting the answers into the domain yaml file.

For convenience, a user can select to print the answers at run/test time using
the `--print_ans` argument when running ptest. This prints the answer for
domain tests and always asserts False after doing so for each test in a file.
This option aids in conctruction of the tests_ans section of the domain
yaml file for new domains or when extending tests on existing domains. Running
`pytest -s ...` prints output in a format that can be copied to the yaml and
edited quickly to get new test results enshrined in the answer key.
Certain tests have answers stored in the domain YAML. This "answer key" is
stored in the top-level key `test_ans`. This can be seen in the examples
listed above. Generally, for a autotest `test_x.py` the key "x" will be
provided below `test_ans` and below it will be what ever data is used in
that test.

These tests answers for a given test are typically a summary statistic
performed on model data in memory. The answers vary with domain and
are most easily collected by running the tests themselves while
verifying accuracy of the test through expert judgement. The answers
enshrined in the domain YAML indicate when test results change. Because
test results change from time to time, a convenience utility function,
`assert_or_print` is provided in `utils.py`. Using this function allows
the option `--print_ans -s` (note that `-s` prints output to the terminal)
to be passed at run time to print all the new values that should be
updated into the domain YAML.
66 changes: 49 additions & 17 deletions autotest/conftest.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import pathlib
import pathlib as pl

import yaml

Expand Down Expand Up @@ -34,6 +34,49 @@ def pytest_addoption(parser):


def pytest_generate_tests(metafunc):
# check domain test data regardless of what fixtures are being requested
if metafunc.config.getoption("all_domains"):
domain_file_list = [
"../test_data/hru_1/hru_1.yaml",
"../test_data/drb_2yr/drb_2yr.yaml",
"../test_data/ucb_2yr/ucb_2yr.yaml",
]
else:
domain_file_list = metafunc.config.getoption("domain_yaml")
if len(domain_file_list) == 0:
domain_file_list = ["../test_data/drb_2yr/drb_2yr.yaml"]

# check that the test_data has been generated and is up-to-date
# for each domain
for dd in domain_file_list:
domain_name = pl.Path(dd).parent.name
test_data_version_file = pl.Path(
f"../test_data/.test_data_version_{domain_name}.txt"
)
if not test_data_version_file.exists():
msg = (
f"Test data for domain {domain_name} do not appear to have\n"
"been generated. Please see DEVELOPER.md for information on\n"
"generating test data.\n"
)
raise ValueError(msg)

repo_version_file = pl.Path("../version.txt")

with open(test_data_version_file) as ff:
test_version = ff.read()
with open(repo_version_file) as ff:
repo_version = ff.read()

if test_version != repo_version:
msg = (
f"Test data for domain {domain_name} do not appear to\n"
"have been generated by the current version of the\n"
"pywatershed repository. Please see DEVELOPER.md for\n"
"information on generating test data.\n"
)
raise ValueError(msg)

if "domain" in metafunc.fixturenames:
# Put --print_ans in the domain fixture as it applies only to the
# domain tests. It is a run time attribute, not actually an attribute
Expand All @@ -42,22 +85,11 @@ def pytest_generate_tests(metafunc):
# Not sure I love this, maybe have a domain_opts fixture later?
print_ans = metafunc.config.getoption("print_ans")

if metafunc.config.getoption("all_domains"):
domain_file_list = [
"../test_data/hru_1/hru_1.yaml",
"../test_data/drb_2yr/drb_2yr.yaml",
"../test_data/ucb_2yr/ucb_2yr.yaml",
]
else:
domain_file_list = metafunc.config.getoption("domain_yaml")
if len(domain_file_list) == 0:
domain_file_list = ["../test_data/drb_2yr/drb_2yr.yaml"]

# open and read in the yaml and
domain_ids = [pathlib.Path(ff).stem for ff in domain_file_list]
domain_ids = [pl.Path(ff).stem for ff in domain_file_list]
domain_list = []
for dd in domain_file_list:
dd_file = pathlib.Path(dd)
dd_file = pl.Path(dd)
with dd_file.open("r") as yaml_file:
domain_dict = yaml.safe_load(yaml_file)

Expand All @@ -78,15 +110,15 @@ def pytest_generate_tests(metafunc):
"prms_run_dir",
"prms_output_dir",
]:
domain_dict[ff] = pathlib.Path(domain_dict[ff])
domain_dict[ff] = pl.Path(domain_dict[ff])
if not domain_dict[ff].is_absolute():
domain_dict[ff] = domain_dict["dir"] / domain_dict[ff]

for fd_key in ["cbh_inputs"]:
domain_dict[fd_key] = {
key: (
pathlib.Path(val)
if pathlib.Path(val).is_absolute()
pl.Path(val)
if pl.Path(val).is_absolute()
else domain_dict["dir"] / val
)
for key, val in domain_dict[fd_key].items()
Expand Down
Loading

0 comments on commit c0b256f

Please sign in to comment.