Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding check_parameter_positivity() function to seir.py #428

Open
wants to merge 24 commits into
base: dev
Choose a base branch
from

Conversation

emprzy
Copy link
Collaborator

@emprzy emprzy commented Dec 17, 2024

Describe your changes.

This pull request introduces a function called check_parameter_positivity() to seir.py. check_parameter_positivity() takes in an array of parsed parameters, as well as parameter names, subpopulation names, and dates, checks for the existence of negative parameter values, and throws a ValueError error if they are found. When throwing the error, check_parameter_positivity() will only print the earliest (w.r.t date) negative parameter to avoid redundant feedback. Example output will resemble:

The earliest date negative for each subpop and unique parameter are:
subpop: 50000, parameter eta_X0toX3_highIE*1*1*nuage18to64HR: 2023-07-21
subpop: 32000, parameter eta_X1toX4_highIE*1*1*nuage0to17: 2024-02-15
subpop: 28000, parameter eta_X1toX4_highIE*1*1*nuage18to64LR: 2025-06-01

A test function, test_check_parameter_positivity() has also been added to test_seir.py to test check_parameter_positivity().

Does this pull request make any user interface changes? If so please describe.

No changes to the user interface.

What does your pull request address? Tag relevant issues.

This pull request addresses GH #215.

Tag relevant team members.

@jcblemai

@emprzy
Copy link
Collaborator Author

emprzy commented Dec 17, 2024

Still need to add the test function (not ready yet for review)

@TimothyWillard TimothyWillard added enhancement Request for improvement or addition of new feature(s). gempyor Concerns the Python core. quick issue Short or easy fix. next release Marks a PR as a target to include in the next release. low priority Low priority. labels Dec 18, 2024
@TimothyWillard TimothyWillard linked an issue Dec 18, 2024 that may be closed by this pull request
@emprzy emprzy marked this pull request as ready for review December 18, 2024 18:02
@emprzy
Copy link
Collaborator Author

emprzy commented Dec 18, 2024

@jcblemai I have initial concerns that I did not properly instantiate ModelInfo in my test_seir.py::test_neg_params() function. Also, I'm hesitant about the parameters I passed into my neg_params() (line ~163 of seir.py) function call being correct extractions of the specific data that the function calls for. Haven't run checks yet but wanted to say that's what I will likely need help on.

@jcblemai
Copy link
Collaborator

I'm very, very bad at naming things (Carl, Tim are definitely very good at this) but neg_param() is not informative enough as a function name, something like check_parameter_positivity or something seems more suitable.

@jcblemai
Copy link
Collaborator

jcblemai commented Dec 19, 2024

parsed_parameter has shape (n_parsed_parameters(unique_strings) X n_times X n_subpop), is that what's causing the CI error ?

EDIT: no, see below:

@jcblemai
Copy link
Collaborator

oh and you are right, this is not passing the right parameters. Basically, within flepimop the workflow is

  1. parameters are drawn > parameters array, same for all subpop and times (except if the parameter is a time series), an array of dimensions n_parameters, n_times, n_subpop (unsure of the order)
  2. Modifiers are applied, this changes the array (not the dimensions) so parameters varies in time and subpop according to their modifiers
  3. Then, the compartments module computes the way to aggregate parameters for the sampler, e.g if on transition has a rate of beta * phi / gamma we do NOT want to pass beta, phi, and gamma to the simulator because
  • it's a jit compiled function so parsing this formula there is a big hassle and waste of time
  • we don't use these values separately anyway
    So instead compartments compute unique_strings (each one being a formula, a constant, or a a parameter name), and then if you give a parameter array (as above, modified by NPI) to the compartments module it will pre-compute these unique_strings (which are different parameters in some sort) by parsing each formula. This array is called, bad name, parsed_parameter and has size n_unique_strings, n_days, n_subpop

Your function checks the parsed_parameters because each of these steps can input a negative value. You then want to test the fully parsed thing.

Hence, to test, once you have modinf you need to:

# draw some parameters
p_draw = modinf.parameters.parameters_quick_draw(n_days=modinf.n_days, nsubpops=modinf.nsubpops)

# p_draw > SEE SCREENSHOT 2!!! A  SINGLE LINE

# build a modifier object
npi_seir = seir.build_npi_SEIR(modinf=modinf, load_ID=False, sim_id2load=None, config=config)
# apply the NPI to the parameters
reduced_parameters = modinf.parameters.parameters_reduce(p_draw, npi_seir)

# reduced_parameters > SEE SCREENSHOT 2!!! A TIMESERIE

# parse the compartments:
unique_strings, transition_array,proportion_array, proportion_info  = self.modinf.compartments.get_transition_array()

# parse the unique_string and compute the parsed parameter objects
parsed_parameters = modinf.compartments.parse_parameters(reduced_parameters, modinf.parameters.pnames, unique_strings)

# parsed_parameters > SEE SCREENSHOT 3!!! now it's a time series that is computed from a formula
Screenshot 2024-12-19 at 15 49 02 Screenshot 2024-12-19 at 15 49 08 Screenshot 2024-12-19 at 15 49 16

@emprzy emprzy changed the title Adding neg_params() function to seir.py Adding check_parameter_positivity() function to seir.py Dec 20, 2024
@emprzy emprzy requested a review from jcblemai January 7, 2025 14:16
emprzy added 4 commits January 9, 2025 13:03
Small documentation changes and removing an unrelated but unnecessary loc from `test_seir.py`
Incorporating the information within a `print()` statement into the `ValueError` output in `seir.py::check_parameter_positivity()`
Copy link
Contributor

@TimothyWillard TimothyWillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation is great, thanks for doing that! The testing looks good too, I just have some brief comments on that front, should be quick to fix.

My biggest concern is that the error message will be too verbose, I think we want to convey to users quickly the most important thing. Multi-line error messages confuse users and can make it more difficult to diagnose.

flepimop/gempyor_pkg/src/gempyor/seir.py Outdated Show resolved Hide resolved
flepimop/gempyor_pkg/src/gempyor/seir.py Outdated Show resolved Hide resolved
Comment on lines 57 to 64
error_message = (
"The earliest date negative for each subpop and unique parameter are:\n"
)
for param_idx, day_idx, sp_idx in non_redundant_negative_parameters:
error_message += f"subpop: {subpop_names[sp_idx]}, parameter {parameter_names[param_idx]}: {dates[day_idx].date()}\n"
raise ValueError(
f"There are negative parsed-parameters, which is likely to result in incorrect integration.\n{error_message}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that this error message will be quite lengthy and I'm not a fan of multi-line error messages. Is there a way that we could condense this down to one line? Maybe just error on the first negative parameter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I agree the message is lengthy, and it is sub-optimal for it to be multiple lines. But, I'm curious what your thoughts are on the usefulness of having an error message that only returns the first negative parameter, even if the function output knows where all of negative parameters are. Is there not a lot of added value in telling the user all of the columns that have negative values, so they can more quickly address the issue? I'm happy to change it to only show the first negative parameter value, but since the function inherently finds the others, I thought it would be useful to include.

Copy link
Contributor

@pearsonca pearsonca Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that spewing out all the errors is too much noise - multiple errors often arise from single mistakes that need correcting.

I recommend that we limit the detailed part of the error message to the earliest time, for any parameter or subpop that there is a problem. However, I also think its worthwhile to indicate the totality of the problem. Something like

There are negative parameter error(s) for config FFFF: the first at date DD-MM-YY, subpopulation XX, parameter YY.
Affected subpopulations include: {...}. Affected parameters include {...}. There are NNN total negative entries.

(possibly some of those elements can be curtailed if there are not multple subpops, etc)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just chiming in that I asked @emprzy for this verbose error message (first date negative for all parameter and subpop negative) as it helps to debug configs without retrying (which, e.g for a RSV config takes 6/7 minutes), sorry Emily. Totally understand that we want to keep it light but I think a good diagnosis (e.g. a graph or something) is useful here, though perhaps not inside the simulate command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more clear the error message is the most useful it would be practically, though understand it's long to print out everything for each subpop for example. In particular I think it would be useful as much information about the specific parameters? Maybe a simplification/modification of this:

There are negative parameter errors in subpops {...}, starting from date XXXX:
parameters: eta_X0toX3_highIE*1*1*nuage18to64HR, eta_X1toX4_highIE*1*1*nuage0to17, eta_X1toX4_highIE*1*1*nuage18to64LR.... 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saraloo's suggestion seems like a reasonable compromise to me.

Minor: but maybe starting from date YYYY-MM-DD, in parameters: ... instead, notably no newline. Newlines can be annoying to format in unit tests matching on exception, see this prior version of Parameters unit tests as an example:

with pytest.raises(
ValueError,
match=(
rf"^ERROR loading file {tmp_file} for parameter sigma\:\s+the \'date\' "
rf"entries of the provided file do not include all the days specified "
rf"to be modeled by\s+the config\. the provided file includes "
rf"{(timeseries_end_date - timeseries_start_date).days + 1} days "
rf"between {timeseries_start_date}( 00\:00\:00)? to "
rf"{timeseries_end_date}( 00\:00\:00)?,\s+while there are "
rf"{mock_inputs.number_of_days()} days in the config time span of "
rf"{mock_inputs.ti}->{mock_inputs.tf}\. The file must contain entries "
rf"for the\s+the exact start and end dates from the config\. $"
),
):
mock_inputs.create_parameters_instance()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re unit test matching, I'd reiterate that its overkill to match exact messages - for example here, should only be matching first bad date string (irrespective of what's around it), subpop id (ibid), and offending parameters (ibid).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting from date YYYY-MM-DD, in parameters: ...

@TimothyWillard , does this mean you propose leaving out the subpop information and just including the parameter names that are negative?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TimothyWillard , does this mean you propose leaving out the subpop information and just including the parameter names that are negative?

My bad, no, I just was conveying an edit to a portion of @saraloo's suggestion. The full change with my edit would be:

There are negative parameter errors in subpops {...}, starting from date YYYY-MM-DD in parameters: eta_X0toX3_highIE*1*1*nuage18to64HR, eta_X1toX4_highIE*1*1*nuage0to17, eta_X1toX4_highIE*1*1*nuage18to64LR.... 

Copy link
Collaborator Author

@emprzy emprzy Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TimothyWillard @saraloo @jcblemai
The error message now reads as follows:

ValueError: There are negative parameter errors in subpops ['56000', '44000', '30000'], starting from date 2023-03-19 in parameters ['alpha*1*1*1', 'sigma_OMICRON*1*1*1', '3*gamma*1*1*1'].

Is this what you had in mind? Happy to change it, just wanted to confirm before pushing.

flepimop/gempyor_pkg/src/gempyor/seir.py Show resolved Hide resolved
flepimop/gempyor_pkg/tests/seir/test_seir.py Outdated Show resolved Hide resolved
flepimop/gempyor_pkg/tests/seir/test_seir.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for improvement or addition of new feature(s). gempyor Concerns the Python core. low priority Low priority. next release Marks a PR as a target to include in the next release. quick issue Short or easy fix.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect negative rate and give a nice error when that happens
5 participants