Skip to content

Conversation

@MehdiTaghizadehUVa
Copy link
Contributor

PhysicsNeMo Pull Request

Description

This PR integrates FloodForecaster, a domain-adaptive Geometry-Informed Neural Operator (GINO) framework for rapid, high-resolution flood forecasting. The framework enables accurate, real-time flood predictions by learning from source domain data and adapting to target domains through adversarial training.

Key Features

  • Domain-Adaptive Training: Three-stage pipeline (pretraining → domain adaptation → rollout evaluation) for transfer learning from data-rich to data-scarce domains
  • Gradient Reversal Layer (GRL): Adversarial domain adaptation using CNN-based domain classifier
  • GINO Integration: Combines Graph Neural Operators (GNO) and Fourier Neural Operators (FNO) for irregular terrain processing
  • Physics-Informed Metrics: Volume conservation, arrival time, inundation duration, CSI, and FHCA
  • PhysicsNeMo Module Compliance: All components inherit from physicsnemo.Module with checkpointing support

Components Added

  • Training modules: domain adaptation trainer, pretraining pipeline
  • Data processing: GINO wrapper, preprocessing/postprocessing utilities
  • Datasets: Custom dataset classes for flood data loading
  • Inference: Rollout evaluation and visualization pipeline
  • Configuration: Hydra-based configuration system
  • Documentation: Complete README with usage examples

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.

Dependencies

No new dependencies required. All packages are either already in PhysicsNeMo or standard scientific computing libraries:

  • neuralop>=2.0.0 (existing)
  • hydra-core>=1.2.0 (existing)
  • wandb>=0.12.0 (optional, for logging)
  • Standard packages: matplotlib, tqdm, numpy, torch, omegaconf, pandas, h5py

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

…butes, da_config mock, gradient reversal, LpLoss signature, and projection mocks
- Remove rollout evaluation from train.py (moved to inference.py)
- Fix LpLoss warning by wrapping with LpLossWrapper in domain_adaptation
- Fix logger.debug() calls to handle PythonLogger properly
- Fix syntax error in plotting.py (unterminated docstring)
- Fix struct mode error in pretraining.py for neuralop get_model
- Fix error handling in inference.py (exc_info parameter)
- Remove temporary files (GINO_BATCH_SIZE_ANALYSIS.md, 3.0.0)
- Clean separation: train.py for training, inference.py for rollout/visualization
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 10, 2025

Greptile Overview

Greptile Summary

This PR integrates FloodForecaster, a comprehensive domain-adaptive GINO (Geometry-Informed Neural Operator) framework for flood forecasting into PhysicsNeMo. The implementation adds a complete three-stage training pipeline: pretraining on source domain data, domain adaptation using adversarial training with gradient reversal layers, and rollout evaluation with physics-informed metrics. The framework combines Graph Neural Operators (GNO) and Fourier Neural Operators (FNO) to handle irregular terrain processing and enables transfer learning from data-rich to data-scarce domains.

The integration follows PhysicsNeMo's architectural patterns with all components inheriting from physicsnemo.Module for checkpoint compatibility. The framework includes custom dataset classes for flood data loading, comprehensive data processing utilities, adversarial training components, and extensive visualization/evaluation capabilities. The implementation maintains compatibility with neuralop 2.0.0 API while adding domain adaptation capabilities through gradient reversal layers and CNN-based domain classifiers.

Critical Issue: The untrained_checkpoint.mdlus file is completely empty, which will likely cause runtime failures when the system attempts to load it.

Important Files Changed

Filename Score Overview
untrained_checkpoint.mdlus 1/5 Empty file that should contain model checkpoint data but is completely blank
physicsnemo/models/module.py 4/5 Added weights_only=False parameter to torch.load calls for PyTorch compatibility
examples/weather/flood_modeling/flood_forecaster/training/pretraining.py 3/5 Implements source domain pretraining with complex logger handling and missing error handling
examples/weather/flood_modeling/flood_forecaster/conf/config.yaml 3/5 Comprehensive Hydra configuration with hardcoded entity name and epoch count mismatch
examples/weather/flood_modeling/flood_forecaster/datasets/rollout_dataset.py 3/5 Rollout dataset with potential device consistency issues and early __getitem__ access
examples/weather/flood_modeling/flood_forecaster/utils/normalization.py 3/5 Normalization utilities with potential logic issue in dynamic field handling
examples/weather/flood_modeling/flood_forecaster/training/domain_adaptation.py 4/5 Adversarial domain adaptation training with gradient reversal layers and comprehensive feature extraction
examples/weather/flood_modeling/flood_forecaster/data_processing/data_processor.py 4/5 GINO wrapper and data processor with complex tensor reshaping and comprehensive error handling
examples/weather/flood_modeling/flood_forecaster/datasets/flood_dataset.py 4/5 Comprehensive flood dataset with multi-physics data handling and noise augmentation
examples/weather/flood_modeling/flood_forecaster/inference/rollout.py 4/5 Rollout evaluation with autoregressive prediction and physics-informed metrics computation
examples/weather/flood_modeling/flood_forecaster/inference.py 4/5 Inference script with robust checkpoint loading and comprehensive error handling
examples/weather/flood_modeling/flood_forecaster/train.py 4/5 Two-stage training pipeline with sophisticated wandb logging and distributed computing support
examples/weather/flood_modeling/flood_forecaster/utils/plotting.py 4/5 Comprehensive visualization utilities with publication-quality plotting and error analysis
examples/weather/flood_modeling/flood_forecaster/datasets/normalized_dataset.py 4/5 Normalized dataset wrappers with automatic query point generation for spatial interpolation
examples/weather/flood_modeling/flood_forecaster/README.md 4/5 Well-structured documentation with complete methodology and usage instructions
test/models/test_flood_forecaster_training.py 4/5 Comprehensive training tests with complex import logic and non-regression testing
test/models/test_flood_forecaster_data_processing.py 4/5 Data processing tests with extensive mocking and checkpoint serialization validation
test/datapipes/test_flood_forecaster_datasets.py 4/5 Dataset tests covering all dataset classes with proper mock data structures
test/utils/test_flood_forecaster_utils.py 4/5 Utility function tests with parametrized device testing and normalization validation
examples/weather/flood_modeling/flood_forecaster/training/__init__.py 4/5 Training module initialization with circular import handling
examples/weather/flood_modeling/flood_forecaster/datasets/__init__.py 5/5 Clean dataset module initialization following Python packaging conventions
examples/weather/flood_modeling/flood_forecaster/data_processing/__init__.py 5/5 Data processing module initialization with proper NVIDIA licensing
examples/weather/flood_modeling/flood_forecaster/utils/__init__.py 5/5 Utils module initialization exporting normalization functions
examples/weather/flood_modeling/flood_forecaster/inference/__init__.py 5/5 Inference module initialization with clean rollout_prediction export
examples/weather/flood_modeling/flood_forecaster/requirements.txt 5/5 Standard dependencies file with reasonable version constraints

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (21)

  1. test/utils/test_flood_forecaster_utils.py, line 39 (link)

    logic: Tests will fail on systems without CUDA when device='cuda:0' is used. Should these tests skip CUDA tests when CUDA is not available rather than fail?

  2. examples/weather/flood_modeling/flood_forecaster/training/__init__.py, line 19-27 (link)

    style: This circular import handling pattern is unusual and may indicate architectural issues. Setting functions to None could cause AttributeErrors when imported modules try to use these functions. Are there actual circular dependencies between these modules that require this pattern?

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  3. examples/weather/flood_modeling/flood_forecaster/training/__init__.py, line 29 (link)

    logic: Functions are included in all even when they might be None, which could export None values if ImportError occurs.

  4. examples/weather/flood_modeling/flood_forecaster/utils/normalization.py, line 170-173 (link)

    logic: Dynamic field normalization assumes target normalizer exists, but this could fail if target_big is None due to empty tgt_list. The assignment on line 173 happens regardless of whether target normalization succeeded.

    Should dynamic normalization be conditional on successful target normalization, or should dynamic have its own normalizer when target is unavailable?

  5. examples/weather/flood_modeling/flood_forecaster/utils/normalization.py, line 129 (link)

    logic: Potential issue with None values in tgt_list - torch.stack will fail if any tensor in the list is None. Consider filtering None values before stacking.

  6. examples/weather/flood_modeling/flood_forecaster/datasets/normalized_dataset.py, line 31 (link)

    style: Mutable default argument query_res=[64, 64] could cause issues if modified

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  7. examples/weather/flood_modeling/flood_forecaster/datasets/normalized_dataset.py, line 98 (link)

    style: Same mutable default argument issue as line 31

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  8. examples/weather/flood_modeling/flood_forecaster/datasets/rollout_dataset.py, line 138-140 (link)

    logic: calling __getitem__ during initialization can cause issues if the dataset state isn't fully ready. Is this access pattern safe given that all data loading happens before line 138?

  9. examples/weather/flood_modeling/flood_forecaster/datasets/rollout_dataset.py, line 261 (link)

    logic: device mismatch: self.xy_coords is always on CPU but dynamic tensor device is unknown

  10. test/datapipes/test_flood_forecaster_datasets.py, line 61 (link)

    style: Device parameter is not used in the test functions despite being parametrized

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  11. test/datapipes/test_flood_forecaster_datasets.py, line 272 (link)

    logic: Including M40_XY.txt in static_files when it's already the xy_file may cause duplication or confusion. Is this intentional behavior or should xy_file be excluded from static_files?

  12. examples/weather/flood_modeling/flood_forecaster/train.py, line 431 (link)

    logic: The config variable is always None here - it's initialized as None on line 220 and never assigned a value

  13. test/models/test_flood_forecaster_training.py, line 510 (link)

    style: Very relaxed tolerance (atol=1.0) may mask actual numerical issues - typical ML tolerances are 1e-4 to 1e-6. Are you confident this high tolerance won't hide regression issues in the CNN domain classifier?

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  14. test/models/test_flood_forecaster_training.py, line 431 (link)

    style: Using weights_only=False with torch.load creates potential security risk if loading untrusted data

  15. examples/weather/flood_modeling/flood_forecaster/inference.py, line 24 (link)

    style: os import is unused

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  16. examples/weather/flood_modeling/flood_forecaster/inference.py, line 88 (link)

    style: is_logger variable is assigned but never used

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  17. examples/weather/flood_modeling/flood_forecaster/utils/plotting.py, line 80-82 (link)

    style: redundant assignment when geometry is already np.ndarray

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  18. examples/weather/flood_modeling/flood_forecaster/training/domain_adaptation.py, line 1071 (link)

    logic: da_class_loss_weight defaults to 0.0, effectively disabling adversarial training. Is the default value of 0.0 intentional, or should this have a positive default to enable adversarial training?

  19. examples/weather/flood_modeling/flood_forecaster/data_processing/data_processor.py, line 258 (link)

    style: Consider documenting which specific kwargs are filtered (e.g., 'y') for clarity

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  20. examples/weather/flood_modeling/flood_forecaster/datasets/flood_dataset.py, line 301 (link)

    logic: potential indexing error: uses step_sigma_t[0, 0] instead of full tensor for element-wise multiplication

    Should this be step_sigma_t[0] or just step_sigma_t.squeeze() to properly broadcast across the 3 channels?

  21. examples/weather/flood_modeling/flood_forecaster/datasets/flood_dataset.py, line 111 (link)

    logic: filename inconsistency: searches for 'train_.txt' but error message mentions 'train.txt'

25 files reviewed, 21 comments

Edit Code Review Agent Settings | Greptile

@mnabian mnabian self-requested a review December 11, 2025 00:45
@mnabian
Copy link
Collaborator

mnabian commented Dec 11, 2025

/blossom-ci

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete this file


# Load state dict after closing archive
model_dict = torch.load(io.BytesIO(model_bytes), map_location=device)
model_dict = torch.load(io.BytesIO(model_bytes), map_location=device, weights_only=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain why these changes are needed here. @CharlelieLrt for viz

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's BC breaking, we can't accept this change


Data should be organized in folders with consistent naming patterns as specified in the configuration. The model supports both source and target domain datasets for domain adaptation training.

**Dataset Link**: The dataset used in the paper is available at: https://github.com/MehdiTaghizadehUVa/FloodForecaster
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Are you providing the dataset, or just the scripts to generate the dataset?

Run the training script:

```bash
python train.py --config-path conf --config-name config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you already have these as the default, is it necessary to specify --config-path and --config-name here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does your training recipe support multi-GPU/multi-node training? If yes, please specify the command. If not, please add a note.

2. Run the inference script:

```bash
python inference.py --config-path conf --config-name config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does your inference recipe support multi-GPU/multi-node inference? If yes, please specify the command. If not, please add a note.

```yaml
model:
model_arch: 'gino'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually configurable?


For questions, feedback, or collaborations:

- **Mehdi Taghizadeh**<[email protected]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention that you are the code contributor and maintainer.

- **Jonathan L. Goodall**<[email protected]>
- **Negin Alemazkoor** (Corresponding Author) – <[email protected]>

## License
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this section. Please remove it.

@@ -0,0 +1,338 @@
# FloodForecaster: A Domain-Adaptive Geometry-Informed Neural Operator Framework for Rapid Flood Forecasting
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comprehensive and very well structured readme!

Comment on lines +29 to +30
wireup_info: 'mpi'
wireup_store: 'tcp'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to use physicsnemo distributed utilities instead?


# Load state dict after closing archive
model_dict = torch.load(io.BytesIO(model_bytes), map_location=device)
model_dict = torch.load(io.BytesIO(model_bytes), map_location=device, weights_only=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's BC breaking, we can't accept this change

wireup_store: 'tcp'
model_parallel_size: 1
seed: 123
device: 'cuda:0'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we set use_distributed=True, how should the device filed be specified? Would it be a list of devices? Please clarify. It would be nice to add some details about distributed computing configs in the readme.

Comment on lines +167 to +169
resume_from_dir: null
resume_from_source: null
resume_from_adapt: null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these needed? If som please describe how these should be used.

n_epochs_adapt: 2
learning_rate: 1e-4
adapt_learning_rate: 1e-4
training_loss: 'l2'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List the available options in a comment

training_loss: 'l2'
testing_loss: 'l2'
weight_decay: 1e-4
amp_autocast: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List the available options in a comment


raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

def save_checkpoint(self, save_folder: str, save_name: str) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, how is GiNO's save and load checkpoint methods different from torch's save/load utilities? Couldn't you use the physicsnemo save/load utilities for better integration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants