-
Notifications
You must be signed in to change notification settings - Fork 513
Add FloodForecaster: Domain-Adaptive GINO Framework for Flood Forecasting #1271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Moved sample_animation.gif to docs/img - Removed functions using pickle due to security concerns - Removed duplicate import of KolmogorovArnoldNetwork in __init__.py
…butes, da_config mock, gradient reversal, LpLoss signature, and projection mocks
- Remove rollout evaluation from train.py (moved to inference.py) - Fix LpLoss warning by wrapping with LpLossWrapper in domain_adaptation - Fix logger.debug() calls to handle PythonLogger properly - Fix syntax error in plotting.py (unterminated docstring) - Fix struct mode error in pretraining.py for neuralop get_model - Fix error handling in inference.py (exc_info parameter) - Remove temporary files (GINO_BATCH_SIZE_ANALYSIS.md, 3.0.0) - Clean separation: train.py for training, inference.py for rollout/visualization
Greptile OverviewGreptile SummaryThis PR integrates FloodForecaster, a comprehensive domain-adaptive GINO (Geometry-Informed Neural Operator) framework for flood forecasting into PhysicsNeMo. The implementation adds a complete three-stage training pipeline: pretraining on source domain data, domain adaptation using adversarial training with gradient reversal layers, and rollout evaluation with physics-informed metrics. The framework combines Graph Neural Operators (GNO) and Fourier Neural Operators (FNO) to handle irregular terrain processing and enables transfer learning from data-rich to data-scarce domains. The integration follows PhysicsNeMo's architectural patterns with all components inheriting from Critical Issue: The Important Files Changed
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (21)
-
test/utils/test_flood_forecaster_utils.py, line 39 (link)logic: Tests will fail on systems without CUDA when device='cuda:0' is used. Should these tests skip CUDA tests when CUDA is not available rather than fail?
-
examples/weather/flood_modeling/flood_forecaster/training/__init__.py, line 19-27 (link)style: This circular import handling pattern is unusual and may indicate architectural issues. Setting functions to None could cause AttributeErrors when imported modules try to use these functions. Are there actual circular dependencies between these modules that require this pattern?
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
examples/weather/flood_modeling/flood_forecaster/training/__init__.py, line 29 (link)logic: Functions are included in all even when they might be None, which could export None values if ImportError occurs.
-
examples/weather/flood_modeling/flood_forecaster/utils/normalization.py, line 170-173 (link)logic: Dynamic field normalization assumes target normalizer exists, but this could fail if target_big is None due to empty tgt_list. The assignment on line 173 happens regardless of whether target normalization succeeded.
Should dynamic normalization be conditional on successful target normalization, or should dynamic have its own normalizer when target is unavailable?
-
examples/weather/flood_modeling/flood_forecaster/utils/normalization.py, line 129 (link)logic: Potential issue with None values in tgt_list - torch.stack will fail if any tensor in the list is None. Consider filtering None values before stacking.
-
examples/weather/flood_modeling/flood_forecaster/datasets/normalized_dataset.py, line 31 (link)style: Mutable default argument
query_res=[64, 64]could cause issues if modifiedNote: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
examples/weather/flood_modeling/flood_forecaster/datasets/normalized_dataset.py, line 98 (link)style: Same mutable default argument issue as line 31
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
examples/weather/flood_modeling/flood_forecaster/datasets/rollout_dataset.py, line 138-140 (link)logic: calling
__getitem__during initialization can cause issues if the dataset state isn't fully ready. Is this access pattern safe given that all data loading happens before line 138? -
examples/weather/flood_modeling/flood_forecaster/datasets/rollout_dataset.py, line 261 (link)logic: device mismatch:
self.xy_coordsis always on CPU butdynamictensor device is unknown -
test/datapipes/test_flood_forecaster_datasets.py, line 61 (link)style: Device parameter is not used in the test functions despite being parametrized
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
test/datapipes/test_flood_forecaster_datasets.py, line 272 (link)logic: Including M40_XY.txt in static_files when it's already the xy_file may cause duplication or confusion. Is this intentional behavior or should xy_file be excluded from static_files?
-
examples/weather/flood_modeling/flood_forecaster/train.py, line 431 (link)logic: The
configvariable is always None here - it's initialized as None on line 220 and never assigned a value -
test/models/test_flood_forecaster_training.py, line 510 (link)style: Very relaxed tolerance (atol=1.0) may mask actual numerical issues - typical ML tolerances are 1e-4 to 1e-6. Are you confident this high tolerance won't hide regression issues in the CNN domain classifier?
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
test/models/test_flood_forecaster_training.py, line 431 (link)style: Using weights_only=False with torch.load creates potential security risk if loading untrusted data
-
examples/weather/flood_modeling/flood_forecaster/inference.py, line 24 (link)style:
osimport is unusedNote: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
examples/weather/flood_modeling/flood_forecaster/inference.py, line 88 (link)style:
is_loggervariable is assigned but never usedNote: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
examples/weather/flood_modeling/flood_forecaster/utils/plotting.py, line 80-82 (link)style: redundant assignment when geometry is already np.ndarray
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
examples/weather/flood_modeling/flood_forecaster/training/domain_adaptation.py, line 1071 (link)logic:
da_class_loss_weightdefaults to 0.0, effectively disabling adversarial training. Is the default value of 0.0 intentional, or should this have a positive default to enable adversarial training? -
examples/weather/flood_modeling/flood_forecaster/data_processing/data_processor.py, line 258 (link)style: Consider documenting which specific kwargs are filtered (e.g., 'y') for clarity
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
examples/weather/flood_modeling/flood_forecaster/datasets/flood_dataset.py, line 301 (link)logic: potential indexing error: uses
step_sigma_t[0, 0]instead of full tensor for element-wise multiplicationShould this be
step_sigma_t[0]or juststep_sigma_t.squeeze()to properly broadcast across the 3 channels? -
examples/weather/flood_modeling/flood_forecaster/datasets/flood_dataset.py, line 111 (link)logic: filename inconsistency: searches for 'train_.txt' but error message mentions 'train.txt'
25 files reviewed, 21 comments
|
/blossom-ci |
untrained_checkpoint.mdlus
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete this file
|
|
||
| # Load state dict after closing archive | ||
| model_dict = torch.load(io.BytesIO(model_bytes), map_location=device) | ||
| model_dict = torch.load(io.BytesIO(model_bytes), map_location=device, weights_only=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain why these changes are needed here. @CharlelieLrt for viz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's BC breaking, we can't accept this change
|
|
||
| Data should be organized in folders with consistent naming patterns as specified in the configuration. The model supports both source and target domain datasets for domain adaptation training. | ||
|
|
||
| **Dataset Link**: The dataset used in the paper is available at: https://github.com/MehdiTaghizadehUVa/FloodForecaster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Are you providing the dataset, or just the scripts to generate the dataset?
| Run the training script: | ||
|
|
||
| ```bash | ||
| python train.py --config-path conf --config-name config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you already have these as the default, is it necessary to specify --config-path and --config-name here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does your training recipe support multi-GPU/multi-node training? If yes, please specify the command. If not, please add a note.
| 2. Run the inference script: | ||
|
|
||
| ```bash | ||
| python inference.py --config-path conf --config-name config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does your inference recipe support multi-GPU/multi-node inference? If yes, please specify the command. If not, please add a note.
| ```yaml | ||
| model: | ||
| model_arch: 'gino' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually configurable?
|
|
||
| For questions, feedback, or collaborations: | ||
|
|
||
| - **Mehdi Taghizadeh** – <[email protected]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention that you are the code contributor and maintainer.
| - **Jonathan L. Goodall** – <[email protected]> | ||
| - **Negin Alemazkoor** (Corresponding Author) – <[email protected]> | ||
|
|
||
| ## License |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this section. Please remove it.
| @@ -0,0 +1,338 @@ | |||
| # FloodForecaster: A Domain-Adaptive Geometry-Informed Neural Operator Framework for Rapid Flood Forecasting | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comprehensive and very well structured readme!
| wireup_info: 'mpi' | ||
| wireup_store: 'tcp' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be able to use physicsnemo distributed utilities instead?
|
|
||
| # Load state dict after closing archive | ||
| model_dict = torch.load(io.BytesIO(model_bytes), map_location=device) | ||
| model_dict = torch.load(io.BytesIO(model_bytes), map_location=device, weights_only=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's BC breaking, we can't accept this change
| wireup_store: 'tcp' | ||
| model_parallel_size: 1 | ||
| seed: 123 | ||
| device: 'cuda:0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set use_distributed=True, how should the device filed be specified? Would it be a list of devices? Please clarify. It would be nice to add some details about distributed computing configs in the readme.
| resume_from_dir: null | ||
| resume_from_source: null | ||
| resume_from_adapt: null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these needed? If som please describe how these should be used.
| n_epochs_adapt: 2 | ||
| learning_rate: 1e-4 | ||
| adapt_learning_rate: 1e-4 | ||
| training_loss: 'l2' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List the available options in a comment
| training_loss: 'l2' | ||
| testing_loss: 'l2' | ||
| weight_decay: 1e-4 | ||
| amp_autocast: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List the available options in a comment
|
|
||
| raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") | ||
|
|
||
| def save_checkpoint(self, save_folder: str, save_name: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, how is GiNO's save and load checkpoint methods different from torch's save/load utilities? Couldn't you use the physicsnemo save/load utilities for better integration?
PhysicsNeMo Pull Request
Description
This PR integrates FloodForecaster, a domain-adaptive Geometry-Informed Neural Operator (GINO) framework for rapid, high-resolution flood forecasting. The framework enables accurate, real-time flood predictions by learning from source domain data and adapting to target domains through adversarial training.
Key Features
physicsnemo.Modulewith checkpointing supportComponents Added
Checklist
Dependencies
No new dependencies required. All packages are either already in PhysicsNeMo or standard scientific computing libraries:
neuralop>=2.0.0(existing)hydra-core>=1.2.0(existing)wandb>=0.12.0(optional, for logging)matplotlib,tqdm,numpy,torch,omegaconf,pandas,h5pyReview Process
All PRs are reviewed by the PhysicsNeMo team before merging.
Depending on which files are changed, GitHub may automatically assign a maintainer for review.
We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.
AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.