Adding `check_parameter_positivity()` function to `seir.py` #428

emprzy · 2024-12-17T15:44:30Z

Describe your changes.

This pull request introduces a function called check_parameter_positivity() to seir.py. check_parameter_positivity() takes in an array of parsed parameters, as well as parameter names, subpopulation names, and dates, checks for the existence of negative parameter values, and throws a ValueError error if they are found. When throwing the error, check_parameter_positivity() will only print the earliest (w.r.t date) negative parameter to avoid redundant feedback. Example output will resemble:

The earliest date negative for each subpop and unique parameter are:
subpop: 50000, parameter eta_X0toX3_highIE*1*1*nuage18to64HR: 2023-07-21
subpop: 32000, parameter eta_X1toX4_highIE*1*1*nuage0to17: 2024-02-15
subpop: 28000, parameter eta_X1toX4_highIE*1*1*nuage18to64LR: 2025-06-01

A test function, test_check_parameter_positivity() has also been added to test_seir.py to test check_parameter_positivity().

Does this pull request make any user interface changes? If so please describe.

No changes to the user interface.

What does your pull request address? Tag relevant issues.

This pull request addresses GH #215.

Tag relevant team members.

@jcblemai

emprzy · 2024-12-17T16:03:45Z

Still need to add the test function (not ready yet for review)

emprzy · 2024-12-18T18:10:43Z

@jcblemai I have initial concerns that I did not properly instantiate ModelInfo in my test_seir.py::test_neg_params() function. Also, I'm hesitant about the parameters I passed into my neg_params() (line ~163 of seir.py) function call being correct extractions of the specific data that the function calls for. Haven't run checks yet but wanted to say that's what I will likely need help on.

jcblemai · 2024-12-19T14:28:42Z

I'm very, very bad at naming things (Carl, Tim are definitely very good at this) but neg_param() is not informative enough as a function name, something like check_parameter_positivity or something seems more suitable.

jcblemai · 2024-12-19T14:34:08Z

parsed_parameter has shape (n_parsed_parameters(unique_strings) X n_times X n_subpop), is that what's causing the CI error ?

EDIT: no, see below:

jcblemai · 2024-12-19T14:49:46Z

oh and you are right, this is not passing the right parameters. Basically, within flepimop the workflow is

parameters are drawn > parameters array, same for all subpop and times (except if the parameter is a time series), an array of dimensions n_parameters, n_times, n_subpop (unsure of the order)
Modifiers are applied, this changes the array (not the dimensions) so parameters varies in time and subpop according to their modifiers
Then, the compartments module computes the way to aggregate parameters for the sampler, e.g if on transition has a rate of beta * phi / gamma we do NOT want to pass beta, phi, and gamma to the simulator because

it's a jit compiled function so parsing this formula there is a big hassle and waste of time
we don't use these values separately anyway
So instead compartments compute unique_strings (each one being a formula, a constant, or a a parameter name), and then if you give a parameter array (as above, modified by NPI) to the compartments module it will pre-compute these unique_strings (which are different parameters in some sort) by parsing each formula. This array is called, bad name, parsed_parameter and has size n_unique_strings, n_days, n_subpop

Your function checks the parsed_parameters because each of these steps can input a negative value. You then want to test the fully parsed thing.

Hence, to test, once you have modinf you need to:

# draw some parameters
p_draw = modinf.parameters.parameters_quick_draw(n_days=modinf.n_days, nsubpops=modinf.nsubpops)

# p_draw > SEE SCREENSHOT 2!!! A  SINGLE LINE

# build a modifier object
npi_seir = seir.build_npi_SEIR(modinf=modinf, load_ID=False, sim_id2load=None, config=config)
# apply the NPI to the parameters
reduced_parameters = modinf.parameters.parameters_reduce(p_draw, npi_seir)

# reduced_parameters > SEE SCREENSHOT 2!!! A TIMESERIE

# parse the compartments:
unique_strings, transition_array,proportion_array, proportion_info  = self.modinf.compartments.get_transition_array()

# parse the unique_string and compute the parsed parameter objects
parsed_parameters = modinf.compartments.parse_parameters(reduced_parameters, modinf.parameters.pnames, unique_strings)

# parsed_parameters > SEE SCREENSHOT 3!!! now it's a time series that is computed from a formula

It turns out there's only 30 days in April...

Small documentation changes and removing an unrelated but unnecessary loc from `test_seir.py`

Incorporating the information within a `print()` statement into the `ValueError` output in `seir.py::check_parameter_positivity()`

TimothyWillard

The documentation is great, thanks for doing that! The testing looks good too, I just have some brief comments on that front, should be quick to fix.

My biggest concern is that the error message will be too verbose, I think we want to convey to users quickly the most important thing. Multi-line error messages confuse users and can make it more difficult to diagnose.

flepimop/gempyor_pkg/src/gempyor/seir.py

TimothyWillard · 2025-01-10T21:30:56Z

flepimop/gempyor_pkg/src/gempyor/seir.py

+        error_message = (
+            "The earliest date negative for each subpop and unique parameter are:\n"
+        )
+        for param_idx, day_idx, sp_idx in non_redundant_negative_parameters:
+            error_message += f"subpop: {subpop_names[sp_idx]}, parameter {parameter_names[param_idx]}: {dates[day_idx].date()}\n"
+        raise ValueError(
+            f"There are negative parsed-parameters, which is likely to result in incorrect integration.\n{error_message}"
+        )


I'm concerned that this error message will be quite lengthy and I'm not a fan of multi-line error messages. Is there a way that we could condense this down to one line? Maybe just error on the first negative parameter?

Hm, I agree the message is lengthy, and it is sub-optimal for it to be multiple lines. But, I'm curious what your thoughts are on the usefulness of having an error message that only returns the first negative parameter, even if the function output knows where all of negative parameters are. Is there not a lot of added value in telling the user all of the columns that have negative values, so they can more quickly address the issue? I'm happy to change it to only show the first negative parameter value, but since the function inherently finds the others, I thought it would be useful to include.

I agree that spewing out all the errors is too much noise - multiple errors often arise from single mistakes that need correcting.

I recommend that we limit the detailed part of the error message to the earliest time, for any parameter or subpop that there is a problem. However, I also think its worthwhile to indicate the totality of the problem. Something like

There are negative parameter error(s) for config FFFF: the first at date DD-MM-YY, subpopulation XX, parameter YY. Affected subpopulations include: {...}. Affected parameters include {...}. There are NNN total negative entries.

(possibly some of those elements can be curtailed if there are not multple subpops, etc)

Just chiming in that I asked @emprzy for this verbose error message (first date negative for all parameter and subpop negative) as it helps to debug configs without retrying (which, e.g for a RSV config takes 6/7 minutes), sorry Emily. Totally understand that we want to keep it light but I think a good diagnosis (e.g. a graph or something) is useful here, though perhaps not inside the simulate command.

I think the more clear the error message is the most useful it would be practically, though understand it's long to print out everything for each subpop for example. In particular I think it would be useful as much information about the specific parameters? Maybe a simplification/modification of this:

There are negative parameter errors in subpops {...}, starting from date XXXX: parameters: eta_X0toX3_highIE*1*1*nuage18to64HR, eta_X1toX4_highIE*1*1*nuage0to17, eta_X1toX4_highIE*1*1*nuage18to64LR....

@saraloo's suggestion seems like a reasonable compromise to me.

Minor: but maybe starting from date YYYY-MM-DD, in parameters: ... instead, notably no newline. Newlines can be annoying to format in unit tests matching on exception, see this prior version of Parameters unit tests as an example:

flepiMoP/flepimop/gempyor_pkg/tests/parameters/test_parameters_class.py

Lines 283 to 297 in 2ec9695

with pytest.raises(

ValueError,

match=(

rf"^ERROR loading file {tmp_file} for parameter sigma\:\s+the \'date\' "

rf"entries of the provided file do not include all the days specified "

rf"to be modeled by\s+the config\. the provided file includes "

rf"{(timeseries_end_date - timeseries_start_date).days + 1} days "

rf"between {timeseries_start_date}( 00\:00\:00)? to "

rf"{timeseries_end_date}( 00\:00\:00)?,\s+while there are "

rf"{mock_inputs.number_of_days()} days in the config time span of "

rf"{mock_inputs.ti}->{mock_inputs.tf}\. The file must contain entries "

rf"for the\s+the exact start and end dates from the config\. $"

),

):

mock_inputs.create_parameters_instance()

re unit test matching, I'd reiterate that its overkill to match exact messages - for example here, should only be matching first bad date string (irrespective of what's around it), subpop id (ibid), and offending parameters (ibid).

starting from date YYYY-MM-DD, in parameters: ...

@TimothyWillard , does this mean you propose leaving out the subpop information and just including the parameter names that are negative?

@TimothyWillard , does this mean you propose leaving out the subpop information and just including the parameter names that are negative?

My bad, no, I just was conveying an edit to a portion of @saraloo's suggestion. The full change with my edit would be:

There are negative parameter errors in subpops {...}, starting from date YYYY-MM-DD in parameters: eta_X0toX3_highIE*1*1*nuage18to64HR, eta_X1toX4_highIE*1*1*nuage0to17, eta_X1toX4_highIE*1*1*nuage18to64LR....

@TimothyWillard @saraloo @jcblemai
The error message now reads as follows:

ValueError: There are negative parameter errors in subpops ['56000', '44000', '30000'], starting from date 2023-03-19 in parameters ['alpha*1*1*1', 'sigma_OMICRON*1*1*1', '3*gamma*1*1*1'].

Is this what you had in mind? Happy to change it, just wanted to confirm before pushing.

flepimop/gempyor_pkg/src/gempyor/seir.py

flepimop/gempyor_pkg/tests/seir/test_seir.py

oops forgot to do this before previous push

emprzy added 2 commits December 16, 2024 13:08

Update seir.py

80f98cb

Update seir.py

ba683da

TimothyWillard added enhancement Request for improvement or addition of new feature(s). gempyor Concerns the Python core. quick issue Short or easy fix. next release Marks a PR as a target to include in the next release. low priority Low priority. labels Dec 18, 2024

TimothyWillard linked an issue Dec 18, 2024 that may be closed by this pull request

Detect negative rate and give a nice error when that happens #215

Open

Update test_seir.py

a5360ad

emprzy marked this pull request as ready for review December 18, 2024 18:02

Update test_seir.py

f60b62d

emprzy changed the title ~~Adding neg_params() function to seir.py~~ Adding check_parameter_positivity() function to seir.py Dec 20, 2024

emprzy added 4 commits December 20, 2024 13:07

Renaming function and adjusting unit test

98c719d

Date fix in test_seir.py

b6686f0

It turns out there's only 30 days in April...

IndexError fix in unit test

1b841c6

Update test_seir.py

34fdc1e

emprzy requested a review from jcblemai January 7, 2025 14:16

jcblemai approved these changes Jan 9, 2025

View reviewed changes

emprzy added 4 commits January 9, 2025 13:03

Update test_seir.py and seir.py

e3a364f

Small documentation changes and removing an unrelated but unnecessary loc from `test_seir.py`

linting with black

207ddde

Removing the print() statement from check_parameter_positivity

78ddbae

Incorporating the information within a `print()` statement into the `ValueError` output in `seir.py::check_parameter_positivity()`

final linting

133ef9e

emprzy requested review from pearsonca, TimothyWillard and saraloo January 10, 2025 16:45

TimothyWillard requested changes Jan 10, 2025

View reviewed changes

emprzy added 6 commits January 13, 2025 13:06

Addressing suggestions from Tim's review

c95bb04

Changing error message and adding more specific error message matching

9207dbb

Linting with black

6cc9696

oops forgot to do this before previous push

Update test_seir.py

148495a

Regex update

e191981

Regex update

39cdf6c

TimothyWillard mentioned this pull request Jan 17, 2025

Fix R CI Actions With Ubuntu 22 To 24 Update #471

Merged

emprzy added 6 commits January 17, 2025 13:27

Regex update

7951d11

Regex update

2685ee4

Regex update

61dd8dc

Regex update

2a99037

Regex update

cfd05f9

Fixing data type error

44f3fc2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding `check_parameter_positivity()` function to `seir.py` #428

Adding `check_parameter_positivity()` function to `seir.py` #428

emprzy commented Dec 17, 2024 •

edited

Loading

emprzy commented Dec 17, 2024

emprzy commented Dec 18, 2024

jcblemai commented Dec 19, 2024

jcblemai commented Dec 19, 2024 •

edited

Loading

jcblemai commented Dec 19, 2024

TimothyWillard left a comment

TimothyWillard Jan 10, 2025

emprzy Jan 13, 2025

pearsonca Jan 13, 2025 •

edited

Loading

jcblemai Jan 14, 2025

saraloo Jan 15, 2025

TimothyWillard Jan 15, 2025

pearsonca Jan 15, 2025

emprzy Jan 16, 2025

TimothyWillard Jan 16, 2025

emprzy Jan 16, 2025 •

edited

Loading

	with pytest.raises(
	ValueError,
	match=(
	rf"^ERROR loading file {tmp_file} for parameter sigma\:\s+the \'date\' "
	rf"entries of the provided file do not include all the days specified "
	rf"to be modeled by\s+the config\. the provided file includes "
	rf"{(timeseries_end_date - timeseries_start_date).days + 1} days "
	rf"between {timeseries_start_date}( 00\:00\:00)? to "
	rf"{timeseries_end_date}( 00\:00\:00)?,\s+while there are "
	rf"{mock_inputs.number_of_days()} days in the config time span of "
	rf"{mock_inputs.ti}->{mock_inputs.tf}\. The file must contain entries "
	rf"for the\s+the exact start and end dates from the config\. $"
	),
	):
	mock_inputs.create_parameters_instance()

Adding check_parameter_positivity() function to seir.py #428

Are you sure you want to change the base?

Adding check_parameter_positivity() function to seir.py #428

Conversation

emprzy commented Dec 17, 2024 • edited Loading

Describe your changes.

Does this pull request make any user interface changes? If so please describe.

What does your pull request address? Tag relevant issues.

Tag relevant team members.

emprzy commented Dec 17, 2024

emprzy commented Dec 18, 2024

jcblemai commented Dec 19, 2024

jcblemai commented Dec 19, 2024 • edited Loading

jcblemai commented Dec 19, 2024

TimothyWillard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pearsonca Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emprzy Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

Adding `check_parameter_positivity()` function to `seir.py` #428

Adding `check_parameter_positivity()` function to `seir.py` #428

emprzy commented Dec 17, 2024 •

edited

Loading

jcblemai commented Dec 19, 2024 •

edited

Loading

pearsonca Jan 13, 2025 •

edited

Loading

emprzy Jan 16, 2025 •

edited

Loading