-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate BMI init config automatic generation into workflows #607
Integrate BMI init config automatic generation into workflows #607
Conversation
5b4720b
to
6e7ce15
Compare
See https://www.lynker-spatial.com/copyright.html for license details on hydrofabric data. | ||
""" | ||
|
||
_module_to_model_map: Dict[str, Any] = {"CFE": Cfe, "PET": Pet} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on moving this out of the class and passing this in as a parameter to __init__
? My thinking is this will provide a smoother pathway for adding support for new modules as they are introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not sure. I see the benefit of parameterizing things; I added the other_builder_hook_types
init param to future proof along those lines. But I don't see any reason why we ever wouldn't want these available to an instance, certainly for the moment but probably also in the long term. So they should be hard-coded somewhere. And where is better than here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is a fair point. My thinking was that the code that uses this would parametrize it. So, that would most likely be in a service instead of a library. My thinking is it will be easier / less painful to update service code rather than library code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is it will be easier / less painful to update service code rather than library code.
Maybe, but only if we only need to do that once. If we are putting BmiInitConfigAutoGenerator into a library, we intend it to at least eventually receive usage in more than just one place in one service (it's a fair question as to whether we should do this or not, but for the moment we weren't asking that).
Again though, I assume for this that an unrecognized model_type_name
in the realization config should be treated as invalid, from which follows that certain builders should always be available to an instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this looks really good. I left a few minor comments that shouldnt take long to get through.
|
||
_module_to_model_map: Dict[str, Any] = {"CFE": Cfe, "PET": Pet} | ||
""" Map of config strings to builders, for modules with builders than can be easily init with no more info. """ | ||
_no_init_config_modules = {"SLOTH"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be moved out. I think just for the first go, we can keep the builders
and "ignored" modules as separate __init__
params, but it might be useful at some point to combine these ideas into a dataclass
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to the above, I didn't see a scenario when we could avoid excluding SLOTH
. So hard-coding the exclusion seemed appropriate.
I suppose (and again, this lines up with the earlier param) this all assumes that an unrecognized model_type_name
in the realization config is invalid, rather than something the user wants ignored. I feel like that is more appropriate in this context: fail immediately, instead of silently trusting that the user didn't make a mistake (e.g., a typo) and turning any actual errors here into failures at a later step. But I'm open to discussing that further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thinking as above. It will be easier to update service code rather than library code.
self.output_dirs: Dict[int, Path] = dict() | ||
|
||
# TODO: (later) find a better way to do this | ||
hf_gpkg = Path.home().joinpath("Developer/noaa/data/hydrofabric/v201/nextgen_01.gpkg") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These will need to change before we merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so ... maybe not.
These tests are skipped for now. I ran them locally, and I wanted to contribute them, but I'd intended on leaving them out of the automated runs for the time being (I suppose I need to open an issue for doing it later but haven't yet). Mainly that was to figure out how to best handle making the necessary hydrofabric data available.
If you object, then we can address the issue now. In short, I think the best option is using Mike's hydrofabric subsetting tools and committing something smaller to the repo, but I'm not terribly familiar with those. I'm also not sure if there'd be any significance/benefit to choosing any particular subset. I.e., it just would take more time, so I was trying to defer that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #617 for tracking the task of updating the tests.
"Developer/noaa/dmod/data/example_realization_configs/ex_realization_config_03.json") | ||
noah_params_dir = Path.home().joinpath("Developer/noaa/data/noah_owp_ex_params_dir_1") | ||
|
||
self.output_dirs[0] = hf_model_data.parent.joinpath(f"{self.__class__.__name__}_out_0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use tempfile.TemporaryDirectory
here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, well, as discussed above, I'd planned on revisiting the tests a bit later, and these paths might be revised then also. For the moment at least, since the tests are more manual and incomplete, this was to make it easier to go examine the created files.
|
||
self.output_dirs[0] = hf_model_data.parent.joinpath(f"{self.__class__.__name__}_out_0") | ||
self._prep_output_dir(self.output_dirs[0]) | ||
self.generators[0] = BmiInitConfigAutoGenerator(ngen_realization=NgenRealization.parse_file(real_cfg_file), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor tweak here, since setUp
and tearDown
are run before and after each test respectively, it might make sense to memoize reading the realization, HF, and model attributes files from disk and just create a new BmiInitConfigAutoGenerator
each setUp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can do that. I've added something.
configs = [] | ||
cat_id, config = next(gen_pyobj) | ||
configs.append(config) | ||
while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on using a for loop instead?
while True: | |
configs = [] | |
gen_pyobj = generator.generate_configs() | |
cat_id, config = next(gen_pyobj) | |
configs.append(config) | |
for cid, config in gen_pyobj: | |
if cid == cat_id: | |
configs.append(config) | |
else: | |
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Don't remember exactly why I did things that way. I've made a change.
python/services/dataservice/dmod/dataservice/data_derive_util.py
Outdated
Show resolved
Hide resolved
103f1ad
to
a676d20
Compare
Creating util class leveraging ngen.config_gen that creates configs as needed for a particular realization config and hydrofabric, within the dmod.modeldata package.
Adding function to BmiInitConfigAutoGenerator to get a list of the supported BMI modules (i.e., the name strings used to configure them within a realization config) for which the class can generate init configs.
Update data service with capability to derive/generate BMI init config dataset in certain conditions.
Memoize/cache creation of NgenRealization and hydrofabric data objects.
Use for loop instead of "while True".
Assert in find_hydrofabric_files that there is only a single gpkg file to protect against unsupported case. Co-authored-by: Austin Raney <[email protected]>
Co-authored-by: Austin Raney <[email protected]>
Moving to use of "@lru_cache(maxsize=None)" instead of simply "@cache" to maintain compatibility with 3.8.
58e6e41
to
f6a5dd0
Compare
Integrate use of the BMI init config generation tools from the ngen-config-gen package in the ngen-cal repo, having BMI init config dataset be derived from realization config and hydrofabric datasets at the
AWAITING_DATA
job exec step.