Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate BMI init config automatic generation into workflows #607

Merged

Conversation

robertbartel
Copy link
Contributor

Integrate use of the BMI init config generation tools from the ngen-config-gen package in the ngen-cal repo, having BMI init config dataset be derived from realization config and hydrofabric datasets at the AWAITING_DATA job exec step.

@robertbartel robertbartel added enhancement New feature or request maas MaaS Workstream labels May 7, 2024
@robertbartel robertbartel force-pushed the f/enable_bmi_init_gen_in_workflow/main branch from 5b4720b to 6e7ce15 Compare May 13, 2024 13:37
See https://www.lynker-spatial.com/copyright.html for license details on hydrofabric data.
"""

_module_to_model_map: Dict[str, Any] = {"CFE": Cfe, "PET": Pet}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on moving this out of the class and passing this in as a parameter to __init__? My thinking is this will provide a smoother pathway for adding support for new modules as they are introduced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure. I see the benefit of parameterizing things; I added the other_builder_hook_types init param to future proof along those lines. But I don't see any reason why we ever wouldn't want these available to an instance, certainly for the moment but probably also in the long term. So they should be hard-coded somewhere. And where is better than here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is a fair point. My thinking was that the code that uses this would parametrize it. So, that would most likely be in a service instead of a library. My thinking is it will be easier / less painful to update service code rather than library code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is it will be easier / less painful to update service code rather than library code.

Maybe, but only if we only need to do that once. If we are putting BmiInitConfigAutoGenerator into a library, we intend it to at least eventually receive usage in more than just one place in one service (it's a fair question as to whether we should do this or not, but for the moment we weren't asking that).

Again though, I assume for this that an unrecognized model_type_name in the realization config should be treated as invalid, from which follows that certain builders should always be available to an instance.

Copy link
Member

@aaraney aaraney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this looks really good. I left a few minor comments that shouldnt take long to get through.


_module_to_model_map: Dict[str, Any] = {"CFE": Cfe, "PET": Pet}
""" Map of config strings to builders, for modules with builders than can be easily init with no more info. """
_no_init_config_modules = {"SLOTH"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be moved out. I think just for the first go, we can keep the builders and "ignored" modules as separate __init__ params, but it might be useful at some point to combine these ideas into a dataclass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to the above, I didn't see a scenario when we could avoid excluding SLOTH. So hard-coding the exclusion seemed appropriate.

I suppose (and again, this lines up with the earlier param) this all assumes that an unrecognized model_type_name in the realization config is invalid, rather than something the user wants ignored. I feel like that is more appropriate in this context: fail immediately, instead of silently trusting that the user didn't make a mistake (e.g., a typo) and turning any actual errors here into failures at a later step. But I'm open to discussing that further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thinking as above. It will be easier to update service code rather than library code.

self.output_dirs: Dict[int, Path] = dict()

# TODO: (later) find a better way to do this
hf_gpkg = Path.home().joinpath("Developer/noaa/data/hydrofabric/v201/nextgen_01.gpkg")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will need to change before we merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so ... maybe not.

These tests are skipped for now. I ran them locally, and I wanted to contribute them, but I'd intended on leaving them out of the automated runs for the time being (I suppose I need to open an issue for doing it later but haven't yet). Mainly that was to figure out how to best handle making the necessary hydrofabric data available.

If you object, then we can address the issue now. In short, I think the best option is using Mike's hydrofabric subsetting tools and committing something smaller to the repo, but I'm not terribly familiar with those. I'm also not sure if there'd be any significance/benefit to choosing any particular subset. I.e., it just would take more time, so I was trying to defer that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #617 for tracking the task of updating the tests.

"Developer/noaa/dmod/data/example_realization_configs/ex_realization_config_03.json")
noah_params_dir = Path.home().joinpath("Developer/noaa/data/noah_owp_ex_params_dir_1")

self.output_dirs[0] = hf_model_data.parent.joinpath(f"{self.__class__.__name__}_out_0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use tempfile.TemporaryDirectory here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, well, as discussed above, I'd planned on revisiting the tests a bit later, and these paths might be revised then also. For the moment at least, since the tests are more manual and incomplete, this was to make it easier to go examine the created files.


self.output_dirs[0] = hf_model_data.parent.joinpath(f"{self.__class__.__name__}_out_0")
self._prep_output_dir(self.output_dirs[0])
self.generators[0] = BmiInitConfigAutoGenerator(ngen_realization=NgenRealization.parse_file(real_cfg_file),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor tweak here, since setUp and tearDown are run before and after each test respectively, it might make sense to memoize reading the realization, HF, and model attributes files from disk and just create a new BmiInitConfigAutoGenerator each setUp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can do that. I've added something.

configs = []
cat_id, config = next(gen_pyobj)
configs.append(config)
while True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on using a for loop instead?

Suggested change
while True:
configs = []
gen_pyobj = generator.generate_configs()
cat_id, config = next(gen_pyobj)
configs.append(config)
for cid, config in gen_pyobj:
if cid == cat_id:
configs.append(config)
else:
break

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Don't remember exactly why I did things that way. I've made a change.

@robertbartel robertbartel force-pushed the f/enable_bmi_init_gen_in_workflow/main branch from 103f1ad to a676d20 Compare May 14, 2024 19:04
robertbartel and others added 17 commits May 14, 2024 16:49
Creating util class leveraging ngen.config_gen that creates configs as
needed for a particular realization config and hydrofabric, within the
dmod.modeldata package.
Adding function to BmiInitConfigAutoGenerator to get a list of the
supported BMI modules (i.e., the name strings used to configure them
within a realization config) for which the class can generate init
configs.
Update data service with capability to derive/generate BMI init config
dataset in certain conditions.
Memoize/cache creation of NgenRealization and hydrofabric data objects.
Use for loop instead of "while True".
Assert in find_hydrofabric_files that there is only a single gpkg file to protect against unsupported case.

Co-authored-by: Austin Raney <[email protected]>
Moving to use of "@lru_cache(maxsize=None)" instead of simply "@cache"
to maintain compatibility with 3.8.
@aaraney aaraney merged commit 1a367da into NOAA-OWP:master May 15, 2024
8 checks passed
@robertbartel robertbartel deleted the f/enable_bmi_init_gen_in_workflow/main branch May 15, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request maas MaaS Workstream
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants