Improve testing coverage for hydrological functions with multi-backend support#173
Improve testing coverage for hydrological functions with multi-backend support#173
Conversation
Co-authored-by: Oisin-M <60450429+Oisin-M@users.noreply.github.com>
Co-authored-by: Oisin-M <60450429+Oisin-M@users.noreply.github.com>
Co-authored-by: Oisin-M <60450429+Oisin-M@users.noreply.github.com>
tests/distance/array/test_to_sink.py
Outdated
| @pytest.mark.parametrize( | ||
| "river_network, field, expected", | ||
| [ | ||
| ( | ||
| ("cama_nextxy", cama_nextxy_1), | ||
| None, | ||
| distance_1_to_source_shortest, | ||
| ), | ||
| ], | ||
| indirect=["river_network"], | ||
| ) | ||
| def test_distance_to_source(river_network, field, expected): | ||
| """Test distance to source computation.""" | ||
| result = ekh.distance.array.to_source( | ||
| river_network, field=field, path="shortest", return_type="masked" | ||
| ) | ||
| print("Result:", result) | ||
| print("Expected:", expected) | ||
| np.testing.assert_allclose(result, expected, rtol=1e-6) |
There was a problem hiding this comment.
it's confusing that this test is in the file named to_sink. It should be split. Also, this is only testing numpy still - it should test other backends too.
| @pytest.mark.parametrize( | ||
| "river_network, field, locations, expected", | ||
| [ | ||
| ( | ||
| ("cama_nextxy", cama_nextxy_1), | ||
| input_field_1c, | ||
| catchment_query_field_1, | ||
| catchment_max_1c, | ||
| ), | ||
| ], | ||
| indirect=["river_network"], | ||
| ) | ||
| def test_catchments_max(river_network, field, locations, expected): | ||
| """Test catchment max aggregation.""" | ||
| result = ekh.catchments.array.max(river_network, field, locations=locations) | ||
| print("Result:", result) | ||
| print("Expected:", expected) | ||
| np.testing.assert_allclose(result, expected, rtol=1e-6) |
There was a problem hiding this comment.
should test other array backends other than numpy
| @pytest.mark.parametrize( | ||
| "river_network, field, locations, expected", | ||
| [ | ||
| ( | ||
| ("cama_nextxy", cama_nextxy_1), | ||
| input_field_1c, | ||
| catchment_query_field_1, | ||
| catchment_mean_1c, | ||
| ), | ||
| ], | ||
| indirect=["river_network"], | ||
| ) | ||
| def test_catchments_mean(river_network, field, locations, expected): | ||
| """Test catchment mean aggregation.""" | ||
| result = ekh.catchments.array.mean(river_network, field, locations=locations) | ||
| print("Result:", result) | ||
| print("Expected:", expected) | ||
| np.testing.assert_allclose(result, expected, rtol=1e-6) |
There was a problem hiding this comment.
should test other array backends other than numpy
| @pytest.mark.parametrize( | ||
| "river_network, field, locations, expected", | ||
| [ | ||
| ( | ||
| ("cama_nextxy", cama_nextxy_1), | ||
| input_field_1c, | ||
| catchment_query_field_1, | ||
| catchment_min_1c, | ||
| ), | ||
| ], | ||
| indirect=["river_network"], | ||
| ) | ||
| def test_catchments_min(river_network, field, locations, expected): | ||
| """Test catchment min aggregation.""" | ||
| result = ekh.catchments.array.min(river_network, field, locations=locations) | ||
| print("Result:", result) | ||
| print("Expected:", expected) | ||
| np.testing.assert_allclose(result, expected, rtol=1e-6) |
There was a problem hiding this comment.
should test other array backends other than numpy
| @pytest.mark.parametrize( | ||
| "river_network, field, locations, expected", | ||
| [ | ||
| ( | ||
| ("cama_nextxy", cama_nextxy_1), | ||
| input_field_1c, | ||
| catchment_query_field_1, | ||
| catchment_sum_1c, | ||
| ), | ||
| ], | ||
| indirect=["river_network"], | ||
| ) | ||
| def test_catchments_sum(river_network, field, locations, expected): | ||
| """Test catchment sum aggregation.""" | ||
| result = ekh.catchments.array.sum(river_network, field, locations=locations) | ||
| print("Result:", result) | ||
| print("Expected:", expected) | ||
| np.testing.assert_allclose(result, expected, rtol=1e-6) |
There was a problem hiding this comment.
should test other array backends other than numpy
| @pytest.mark.parametrize( | ||
| "river_network, field, expected", | ||
| [ | ||
| ( | ||
| ("cama_nextxy", cama_nextxy_1), | ||
| None, | ||
| distance_1_to_sink_shortest, | ||
| ), | ||
| ], | ||
| indirect=["river_network"], | ||
| ) | ||
| def test_distance_to_sink(river_network, field, expected): | ||
| """Test distance to sink computation.""" | ||
| result = ekh.distance.array.to_sink( | ||
| river_network, field=field, path="shortest", return_type="masked" | ||
| ) | ||
| print("Result:", result) | ||
| print("Expected:", expected) | ||
| np.testing.assert_array_equal(result, expected) |
There was a problem hiding this comment.
should test other array backends other than numpy
tests/downstream/array/test_max.py
Outdated
| @pytest.mark.parametrize( | ||
| "river_network, input_field, flow_downstream, mv", | ||
| [ | ||
| ( | ||
| ("cama_nextxy", cama_nextxy_1), | ||
| input_field_1c, | ||
| downstream_metric_max_1c, | ||
| mv_1c, | ||
| ), | ||
| ( | ||
| ("cama_nextxy", cama_nextxy_1), | ||
| input_field_1e, | ||
| downstream_metric_max_1e, | ||
| mv_1e, | ||
| ), | ||
| ], | ||
| indirect=["river_network"], | ||
| ) | ||
| def test_downstream_metric_max(river_network, input_field, flow_downstream, mv): | ||
| output_field = ekh.downstream.array.max( | ||
| river_network, input_field, node_weights=None, return_type="masked" | ||
| ) | ||
| print(output_field) | ||
| print(flow_downstream) | ||
| assert output_field.dtype == flow_downstream.dtype | ||
| np.testing.assert_allclose(output_field, flow_downstream, equal_nan=True) |
There was a problem hiding this comment.
should test other array backends other than numpy
tests/downstream/array/test_mean.py
Outdated
| def test_downstream_metric_mean(river_network, input_field, flow_downstream, mv): | ||
| output_field = ekh.downstream.array.mean( | ||
| river_network, input_field, node_weights=None, return_type="masked" | ||
| ) | ||
| print(output_field) | ||
| print(flow_downstream) | ||
| assert output_field.dtype == flow_downstream.dtype | ||
| np.testing.assert_allclose(output_field, flow_downstream, equal_nan=True) |
There was a problem hiding this comment.
should test other array backends other than numpy
tests/downstream/array/test_min.py
Outdated
| def test_downstream_metric_min(river_network, input_field, flow_downstream, mv): | ||
| output_field = ekh.downstream.array.min( | ||
| river_network, input_field, node_weights=None, return_type="masked" | ||
| ) | ||
| print(output_field) | ||
| print(flow_downstream) | ||
| assert output_field.dtype == flow_downstream.dtype | ||
| np.testing.assert_allclose(output_field, flow_downstream, equal_nan=True) |
There was a problem hiding this comment.
should test other array backends other than numpy
| @pytest.mark.parametrize( | ||
| "river_network, field, expected", | ||
| [ | ||
| ( | ||
| ("cama_nextxy", cama_nextxy_1), | ||
| weights_1, | ||
| length_1_to_sink_shortest, | ||
| ), | ||
| ], | ||
| indirect=["river_network"], | ||
| ) | ||
| def test_length_to_sink(river_network, field, expected): | ||
| """Test length to sink computation.""" | ||
| result = ekh.length.array.to_sink( | ||
| river_network, field=field, path="shortest", return_type="masked" | ||
| ) | ||
| print("Result:", result) | ||
| print("Expected:", expected) | ||
| np.testing.assert_allclose(result, expected, rtol=1e-6) | ||
|
|
||
|
|
||
| @pytest.mark.parametrize( | ||
| "river_network, field, expected", | ||
| [ | ||
| ( | ||
| ("cama_nextxy", cama_nextxy_1), | ||
| weights_1, | ||
| length_1_to_source_shortest, | ||
| ), | ||
| ], | ||
| indirect=["river_network"], | ||
| ) | ||
| def test_length_to_source(river_network, field, expected): | ||
| """Test length to source computation.""" | ||
| result = ekh.length.array.to_source( | ||
| river_network, field=field, path="shortest", return_type="masked" | ||
| ) | ||
| print("Result:", result) | ||
| print("Expected:", expected) | ||
| np.testing.assert_allclose(result, expected, rtol=1e-6) |
There was a problem hiding this comment.
should test other array backends other than numpy. Also, to_source tests don't belong in a file called to_sink
tests/subnetwork/test_subnetwork.py
Outdated
| @pytest.mark.parametrize( | ||
| "river_network", | ||
| [ | ||
| ("cama_nextxy", cama_nextxy_1), | ||
| ], | ||
| indirect=["river_network"], | ||
| ) | ||
| def test_crop(river_network): | ||
| """Test cropping a gridded network to minimum bounding box.""" | ||
| # Skip test if river network doesn't have coords (required for crop) | ||
| if river_network._storage.coords is None: | ||
| pytest.skip("River network does not have coordinates required for crop") | ||
|
|
||
| cropped = ekh.subnetwork.crop(river_network) | ||
|
|
||
| # Check that cropped network has the same or fewer gridcells | ||
| assert cropped.n_nodes == river_network.n_nodes | ||
| assert cropped.shape[0] <= river_network.shape[0] | ||
| assert cropped.shape[1] <= river_network.shape[1] | ||
|
|
||
| # Check that it's a different object | ||
| assert cropped is not river_network |
There was a problem hiding this comment.
We should test an example where it actually crops. <= is not good, it should be an example where we crop to make sure the functionality is working. This could just return a copy of the original network and in this example, it will in fact because there are not missing values around the place.
tests/upstream/array/test_std.py
Outdated
| def test_calculate_upstream_metric_std(river_network, input_field, flow_downstream, mv): | ||
| output_field = ekh.upstream.array.std( | ||
| river_network, input_field, node_weights=None, return_type="masked" | ||
| ) | ||
| print(output_field) | ||
| print(flow_downstream) | ||
| assert output_field.dtype == flow_downstream.dtype | ||
| np.testing.assert_allclose(output_field, flow_downstream, rtol=1e-6, equal_nan=True) |
There was a problem hiding this comment.
should test other array backends other than numpy
tests/upstream/array/test_var.py
Outdated
| def test_calculate_upstream_metric_var(river_network, input_field, flow_downstream, mv): | ||
| output_field = ekh.upstream.array.var( | ||
| river_network, input_field, node_weights=None, return_type="masked" | ||
| ) | ||
| print(output_field) | ||
| print(flow_downstream) | ||
| assert output_field.dtype == flow_downstream.dtype | ||
| np.testing.assert_allclose(output_field, flow_downstream, rtol=1e-6, equal_nan=True) |
There was a problem hiding this comment.
should test other array backends other than numpy
Oisin-M
left a comment
There was a problem hiding this comment.
This looks like it's going in the right direction, but the are a number of issues.
Make sure to read the documentation before adding tests.
the following needs to be fixed:
- add tests for different array backends, not just numpy
- make sure the naming of files and tests is logical
- make sure there are also tests for the xarray implementation and not just the array one
Co-authored-by: Oisin-M <60450429+Oisin-M@users.noreply.github.com>
Co-authored-by: Oisin-M <60450429+Oisin-M@users.noreply.github.com>
…ision Co-authored-by: Oisin-M <60450429+Oisin-M@users.noreply.github.com>
The repository had limited test coverage (32 tests) with several modules completely untested or only partially tested. This PR adds comprehensive tests for all public array-level functions across the core hydrological modules with multi-backend support (numpy, torch, jax where applicable).
Changes
New Test Suites
from_mask()andcrop()functions with proper cropping verificationExtended Test Coverage
to_source()andto_sink()test files (numpy only - other backends not yet supported)to_source()andto_sink()test files with node weights (numpy only - other backends not yet supported)Test Infrastructure
Test Organization
Tests follow the established pattern with proper file separation:
All tests use parametrized fixtures, handle NaN values correctly, test both masked (1D) and gridded (2D) return types where applicable, and properly handle backend-specific limitations.
Backend Support Summary
Results
Original prompt
<issue_description>### What maintenance does this project need?
This project needs more extensive tests to improve the code coverage and catch any bugs.
The repository has the following modules:
river_network,streamorder,upstream,downstream,move,catchments,subnetwork,distanceandlength. All of the public methods for these functions should be extensively tested. Most functions have two implementations, one at the high-level which returns xarray and one at the low-level which returns arrays. The functions should be tested for vector inputs where applicable. For the array implementation, currently only numpy is being tested, but it would be good to also test the other supported array backends.Please read the documentation in detail before commencing. Organise the tests as follows:
Make the code simple to follow and lean, but make sure to prioritise code coverage. All tests must pass, and test the important behaviour.
Organisation
No response</issue_description>
Original prompt