Add FloodForecaster: Domain-Adaptive GINO Framework for Flood Forecasting #1271

MehdiTaghizadehUVa · 2025-12-10T03:19:40Z

PhysicsNeMo Pull Request

Description

This PR integrates FloodForecaster, a domain-adaptive Geometry-Informed Neural Operator (GINO) framework for rapid, high-resolution flood forecasting. The framework enables accurate, real-time flood predictions by learning from source domain data and adapting to target domains through adversarial training.

Key Features

Domain-Adaptive Training: Three-stage pipeline (pretraining → domain adaptation → rollout evaluation) for transfer learning from data-rich to data-scarce domains
Gradient Reversal Layer (GRL): Adversarial domain adaptation using CNN-based domain classifier
GINO Integration: Combines Graph Neural Operators (GNO) and Fourier Neural Operators (FNO) for irregular terrain processing
Physics-Informed Metrics: Volume conservation, arrival time, inundation duration, CSI, and FHCA
PhysicsNeMo Module Compliance: All components inherit from physicsnemo.Module with checkpointing support

Components Added

Training modules: domain adaptation trainer, pretraining pipeline
Data processing: GINO wrapper, preprocessing/postprocessing utilities
Datasets: Custom dataset classes for flood data loading
Inference: Rollout evaluation and visualization pipeline
Configuration: Hydra-based configuration system
Documentation: Complete README with usage examples

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.

Dependencies

No new dependencies required. All packages are either already in PhysicsNeMo or standard scientific computing libraries:

neuralop>=2.0.0 (existing)
hydra-core>=1.2.0 (existing)
wandb>=0.12.0 (optional, for logging)
Standard packages: matplotlib, tqdm, numpy, torch, omegaconf, pandas, h5py

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

- Moved sample_animation.gif to docs/img - Removed functions using pickle due to security concerns - Removed duplicate import of KolmogorovArnoldNetwork in __init__.py

…butes, da_config mock, gradient reversal, LpLoss signature, and projection mocks

- Remove rollout evaluation from train.py (moved to inference.py) - Fix LpLoss warning by wrapping with LpLossWrapper in domain_adaptation - Fix logger.debug() calls to handle PythonLogger properly - Fix syntax error in plotting.py (unterminated docstring) - Fix struct mode error in pretraining.py for neuralop get_model - Fix error handling in inference.py (exc_info parameter) - Remove temporary files (GINO_BATCH_SIZE_ANALYSIS.md, 3.0.0) - Clean separation: train.py for training, inference.py for rollout/visualization

…erence

greptile-apps · 2025-12-10T03:22:30Z

Greptile Overview

Greptile Summary

This PR integrates FloodForecaster, a comprehensive domain-adaptive GINO (Geometry-Informed Neural Operator) framework for flood forecasting into PhysicsNeMo. The implementation adds a complete three-stage training pipeline: pretraining on source domain data, domain adaptation using adversarial training with gradient reversal layers, and rollout evaluation with physics-informed metrics. The framework combines Graph Neural Operators (GNO) and Fourier Neural Operators (FNO) to handle irregular terrain processing and enables transfer learning from data-rich to data-scarce domains.

The integration follows PhysicsNeMo's architectural patterns with all components inheriting from physicsnemo.Module for checkpoint compatibility. The framework includes custom dataset classes for flood data loading, comprehensive data processing utilities, adversarial training components, and extensive visualization/evaluation capabilities. The implementation maintains compatibility with neuralop 2.0.0 API while adding domain adaptation capabilities through gradient reversal layers and CNN-based domain classifiers.

Critical Issue: The untrained_checkpoint.mdlus file is completely empty, which will likely cause runtime failures when the system attempts to load it.

Important Files Changed

Filename	Score	Overview
`untrained_checkpoint.mdlus`	1/5	Empty file that should contain model checkpoint data but is completely blank
`physicsnemo/models/module.py`	4/5	Added `weights_only=False` parameter to torch.load calls for PyTorch compatibility
`examples/weather/flood_modeling/flood_forecaster/training/pretraining.py`	3/5	Implements source domain pretraining with complex logger handling and missing error handling
`examples/weather/flood_modeling/flood_forecaster/conf/config.yaml`	3/5	Comprehensive Hydra configuration with hardcoded entity name and epoch count mismatch
`examples/weather/flood_modeling/flood_forecaster/datasets/rollout_dataset.py`	3/5	Rollout dataset with potential device consistency issues and early `__getitem__` access
`examples/weather/flood_modeling/flood_forecaster/utils/normalization.py`	3/5	Normalization utilities with potential logic issue in dynamic field handling
`examples/weather/flood_modeling/flood_forecaster/training/domain_adaptation.py`	4/5	Adversarial domain adaptation training with gradient reversal layers and comprehensive feature extraction
`examples/weather/flood_modeling/flood_forecaster/data_processing/data_processor.py`	4/5	GINO wrapper and data processor with complex tensor reshaping and comprehensive error handling
`examples/weather/flood_modeling/flood_forecaster/datasets/flood_dataset.py`	4/5	Comprehensive flood dataset with multi-physics data handling and noise augmentation
`examples/weather/flood_modeling/flood_forecaster/inference/rollout.py`	4/5	Rollout evaluation with autoregressive prediction and physics-informed metrics computation
`examples/weather/flood_modeling/flood_forecaster/inference.py`	4/5	Inference script with robust checkpoint loading and comprehensive error handling
`examples/weather/flood_modeling/flood_forecaster/train.py`	4/5	Two-stage training pipeline with sophisticated wandb logging and distributed computing support
`examples/weather/flood_modeling/flood_forecaster/utils/plotting.py`	4/5	Comprehensive visualization utilities with publication-quality plotting and error analysis
`examples/weather/flood_modeling/flood_forecaster/datasets/normalized_dataset.py`	4/5	Normalized dataset wrappers with automatic query point generation for spatial interpolation
`examples/weather/flood_modeling/flood_forecaster/README.md`	4/5	Well-structured documentation with complete methodology and usage instructions
`test/models/test_flood_forecaster_training.py`	4/5	Comprehensive training tests with complex import logic and non-regression testing
`test/models/test_flood_forecaster_data_processing.py`	4/5	Data processing tests with extensive mocking and checkpoint serialization validation
`test/datapipes/test_flood_forecaster_datasets.py`	4/5	Dataset tests covering all dataset classes with proper mock data structures
`test/utils/test_flood_forecaster_utils.py`	4/5	Utility function tests with parametrized device testing and normalization validation
`examples/weather/flood_modeling/flood_forecaster/training/__init__.py`	4/5	Training module initialization with circular import handling
`examples/weather/flood_modeling/flood_forecaster/datasets/__init__.py`	5/5	Clean dataset module initialization following Python packaging conventions
`examples/weather/flood_modeling/flood_forecaster/data_processing/__init__.py`	5/5	Data processing module initialization with proper NVIDIA licensing
`examples/weather/flood_modeling/flood_forecaster/utils/__init__.py`	5/5	Utils module initialization exporting normalization functions
`examples/weather/flood_modeling/flood_forecaster/inference/__init__.py`	5/5	Inference module initialization with clean rollout_prediction export
`examples/weather/flood_modeling/flood_forecaster/requirements.txt`	5/5	Standard dependencies file with reasonable version constraints

greptile-apps

Additional Comments (21)

test/utils/test_flood_forecaster_utils.py, line 39 (link)

logic: Tests will fail on systems without CUDA when device='cuda:0' is used. Should these tests skip CUDA tests when CUDA is not available rather than fail?
examples/weather/flood_modeling/flood_forecaster/training/__init__.py, line 19-27 (link)

style: This circular import handling pattern is unusual and may indicate architectural issues. Setting functions to None could cause AttributeErrors when imported modules try to use these functions. Are there actual circular dependencies between these modules that require this pattern?

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
examples/weather/flood_modeling/flood_forecaster/training/__init__.py, line 29 (link)

logic: Functions are included in all even when they might be None, which could export None values if ImportError occurs.
examples/weather/flood_modeling/flood_forecaster/utils/normalization.py, line 170-173 (link)

logic: Dynamic field normalization assumes target normalizer exists, but this could fail if target_big is None due to empty tgt_list. The assignment on line 173 happens regardless of whether target normalization succeeded.

Should dynamic normalization be conditional on successful target normalization, or should dynamic have its own normalizer when target is unavailable?
examples/weather/flood_modeling/flood_forecaster/utils/normalization.py, line 129 (link)

logic: Potential issue with None values in tgt_list - torch.stack will fail if any tensor in the list is None. Consider filtering None values before stacking.
examples/weather/flood_modeling/flood_forecaster/datasets/normalized_dataset.py, line 31 (link)

style: Mutable default argument query_res=[64, 64] could cause issues if modified

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
examples/weather/flood_modeling/flood_forecaster/datasets/normalized_dataset.py, line 98 (link)

style: Same mutable default argument issue as line 31

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
examples/weather/flood_modeling/flood_forecaster/datasets/rollout_dataset.py, line 138-140 (link)

logic: calling __getitem__ during initialization can cause issues if the dataset state isn't fully ready. Is this access pattern safe given that all data loading happens before line 138?
examples/weather/flood_modeling/flood_forecaster/datasets/rollout_dataset.py, line 261 (link)

logic: device mismatch: self.xy_coords is always on CPU but dynamic tensor device is unknown
test/datapipes/test_flood_forecaster_datasets.py, line 61 (link)

style: Device parameter is not used in the test functions despite being parametrized

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
test/datapipes/test_flood_forecaster_datasets.py, line 272 (link)

logic: Including M40_XY.txt in static_files when it's already the xy_file may cause duplication or confusion. Is this intentional behavior or should xy_file be excluded from static_files?
examples/weather/flood_modeling/flood_forecaster/train.py, line 431 (link)

logic: The config variable is always None here - it's initialized as None on line 220 and never assigned a value
test/models/test_flood_forecaster_training.py, line 510 (link)

style: Very relaxed tolerance (atol=1.0) may mask actual numerical issues - typical ML tolerances are 1e-4 to 1e-6. Are you confident this high tolerance won't hide regression issues in the CNN domain classifier?

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
test/models/test_flood_forecaster_training.py, line 431 (link)

style: Using weights_only=False with torch.load creates potential security risk if loading untrusted data
examples/weather/flood_modeling/flood_forecaster/inference.py, line 24 (link)

style: os import is unused

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
examples/weather/flood_modeling/flood_forecaster/inference.py, line 88 (link)

style: is_logger variable is assigned but never used

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
examples/weather/flood_modeling/flood_forecaster/utils/plotting.py, line 80-82 (link)

style: redundant assignment when geometry is already np.ndarray

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
examples/weather/flood_modeling/flood_forecaster/training/domain_adaptation.py, line 1071 (link)

logic: da_class_loss_weight defaults to 0.0, effectively disabling adversarial training. Is the default value of 0.0 intentional, or should this have a positive default to enable adversarial training?
examples/weather/flood_modeling/flood_forecaster/data_processing/data_processor.py, line 258 (link)

style: Consider documenting which specific kwargs are filtered (e.g., 'y') for clarity

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
examples/weather/flood_modeling/flood_forecaster/datasets/flood_dataset.py, line 301 (link)

logic: potential indexing error: uses step_sigma_t[0, 0] instead of full tensor for element-wise multiplication

Should this be step_sigma_t[0] or just step_sigma_t.squeeze() to properly broadcast across the 3 channels?
examples/weather/flood_modeling/flood_forecaster/datasets/flood_dataset.py, line 111 (link)

logic: filename inconsistency: searches for 'train_.txt' but error message mentions 'train.txt'

_{25 files reviewed, 21 comments}

_{Edit Code Review Agent Settings | Greptile}

…nhance tests

…, and CHANGELOG_ENTRY.md

…e fix

mnabian · 2025-12-11T00:45:51Z

/blossom-ci

mnabian · 2025-12-11T00:47:19Z

untrained_checkpoint.mdlus

Please delete this file

mnabian · 2025-12-11T00:47:47Z

physicsnemo/models/module.py


            # Load state dict after closing archive
-            model_dict = torch.load(io.BytesIO(model_bytes), map_location=device)
+            model_dict = torch.load(io.BytesIO(model_bytes), map_location=device, weights_only=False)


Please explain why these changes are needed here. @CharlelieLrt for viz

That's BC breaking, we can't accept this change

mnabian · 2025-12-11T01:07:14Z