Skip to content

Conversation

@ktangsali
Copy link
Collaborator

@ktangsali ktangsali commented Jan 7, 2026

PhysicsNeMo Pull Request

Description

Refactors the RNN models to the new coding standards.

Checklist

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

@ktangsali ktangsali requested a review from coreyjadams January 7, 2026 00:45
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 7, 2026

Greptile Summary

Refactored RNN models (One2ManyRNN and Seq2SeqRNN) to comply with new coding standards, including moving shared convolution layers from physicsnemo/models/rnn/layers.py to physicsnemo/nn/conv_layers.py for better code organization.

Key Changes:

  • Updated module imports from physicsnemo.models to physicsnemo.core and physicsnemo.nn namespace
  • Removed name field from MetaData dataclass (MOD-000a compliance)
  • Enhanced docstrings with RST formatting, math notation, and runnable examples (MOD-003, MOD-004 compliance)
  • Added jaxtyping type annotations to forward methods (MOD-005 compliance)
  • Implemented comprehensive input validation in forward methods using torch.compiler.is_compiling() guard (MOD-006 compliance)
  • Upgraded ConvLayer, TransposeConvLayer, ConvGRULayer, and ConvResidualBlock from nn.Module to Module base class
  • Added comprehensive test coverage for models and layers (MOD-009 compliance)

Issues Found:

  • Critical: padding order may be incorrect in ConvLayer and TransposeConvLayer for 2D inputs (needs verification as this was existing code)

Testing:

  • New test files validate forward pass, checkpoint loading, AMP optimization, and constructor behavior
  • Test data files added for non-regression testing

Important Files Changed

Filename Overview
physicsnemo/models/rnn/rnn_one2many.py Refactored to use new module paths and standards: updated imports from physicsnemo.models to physicsnemo.core and physicsnemo.nn, removed name field from MetaData, improved documentation with RST formatting and jaxtyping annotations, added comprehensive input validation in forward method
physicsnemo/models/rnn/rnn_seq2seq.py Refactored to use new module paths and standards: updated imports from physicsnemo.models to physicsnemo.core and physicsnemo.nn, removed name field from MetaData, improved documentation with RST formatting and jaxtyping annotations, added comprehensive input validation in forward method
physicsnemo/nn/init.py New file exposing neural network components including ConvLayer, ConvGRULayer, ConvResidualBlock, and TransposeConvLayer that were moved from models/rnn/layers.py to centralize nn components
physicsnemo/nn/conv_layers.py Moved convolution layers from models/rnn/layers.py and upgraded to new standards: made classes inherit from Module instead of nn.Module, added comprehensive RST docstrings with jaxtyping annotations, added input validation in forward methods using torch.compiler.is_compiling() guard
test/models/rnn/test_rnn.py New comprehensive test suite for RNN models covering forward pass, checkpoint save/load, optimizations (AMP), and constructor validation for both One2ManyRNN and Seq2SeqRNN in 2D and 3D
test/models/rnn/test_rnn_layers.py New test suite for convolutional layer components (ConvLayer, TransposeConvLayer, ConvResidualBlock) with comprehensive parametrized testing across different configurations

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. physicsnemo/models/rnn/rnn_one2many.py, line 100-103 (link)

    syntax: output shape mismatch in docstring example - the example shows input torch.randn(4, 6, 1, 16, 16) which is [N=4, C=6, T=1, H=16, W=16], but the output shows torch.Size([4, 6, 16, 16, 16]) which is [N=4, C=6, T=16, H=16, W=16]. the spatial dimensions should remain [16, 16], not become [16, 16, 16]

  2. physicsnemo/nn/conv_layers.py, line 258-263 (link)

    logic: padding order is incorrect for 2D input - the current code applies padding as [pad_h // 2, pad_h - pad_h // 2, pad_w // 2, pad_w - pad_w // 2] but F.pad expects padding in reverse dimension order: [left, right, top, bottom] which should be [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2]

  3. physicsnemo/nn/conv_layers.py, line 432-437 (link)

    logic: padding/cropping order is incorrect for 2D input - torch.nn.functional.pad expects padding in reverse dimension order. current code uses [pad_h // 2 : ..., pad_w // 2 : ...] but the dimensions should be reversed to match width then height

    can you verify the expected padding order matches your framework's conventions for transpose convolutions?

7 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@ktangsali
Copy link
Collaborator Author

Additional Comments (3)

  1. physicsnemo/models/rnn/rnn_one2many.py, line 100-103 (link)
    syntax: output shape mismatch in docstring example - the example shows input torch.randn(4, 6, 1, 16, 16) which is [N=4, C=6, T=1, H=16, W=16], but the output shows torch.Size([4, 6, 16, 16, 16]) which is [N=4, C=6, T=16, H=16, W=16]. the spatial dimensions should remain [16, 16], not become [16, 16, 16]
  2. physicsnemo/nn/conv_layers.py, line 258-263 (link)
    logic: padding order is incorrect for 2D input - the current code applies padding as [pad_h // 2, pad_h - pad_h // 2, pad_w // 2, pad_w - pad_w // 2] but F.pad expects padding in reverse dimension order: [left, right, top, bottom] which should be [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2]
  3. physicsnemo/nn/conv_layers.py, line 432-437 (link)
    logic: padding/cropping order is incorrect for 2D input - torch.nn.functional.pad expects padding in reverse dimension order. current code uses [pad_h // 2 : ..., pad_w // 2 : ...] but the dimensions should be reversed to match width then height
    can you verify the expected padding order matches your framework's conventions for transpose convolutions?

7 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Fixed point 2 and 3. For point 1, the output is correct. [4, 6, 16, 16, 16] means batch=4, channels=6, timesteps=16, height=16, width=16

@ktangsali
Copy link
Collaborator Author

/blossom-ci

@coreyjadams
Copy link
Collaborator

I think this looks good to me.

When UNet is updated (Which is currently assigned to @peterdsharpe ...) we might consider consolidating some of the "ConvBlock" and "ConvLayer" etc implementation. But this one came first so no need to hold it up :)

@ktangsali ktangsali force-pushed the model-standards-rnn branch from f7cc8a1 to 580adc3 Compare January 8, 2026 23:23
@ktangsali
Copy link
Collaborator Author

/blossom-ci

@ktangsali
Copy link
Collaborator Author

/blossom-ci

@ktangsali ktangsali added this pull request to the merge queue Jan 9, 2026
@ktangsali ktangsali removed this pull request from the merge queue due to a manual request Jan 9, 2026
@ktangsali ktangsali requested a review from CharlelieLrt January 9, 2026 06:17
@coreyjadams
Copy link
Collaborator

There should be a test that loads a checkpoint file. Unlike existing tests using checkpoints, the checkpoint file should be part of the test suite.

@CharlelieLrt I like this idea a lot, but I'm thinking about having [~15 models] x [ checkpoint or two each] x [several MB per checkpoint or more, per model] leading to repo bloat. Do we want to keep these checkpoint files with git lfs or perhaps a separate repo we can use as a git submodule during testing?

@CharlelieLrt
Copy link
Collaborator

@coreyjadams agree the size of the combined .mdlus checkpoints could be a problem. When I did that for other tests I made sure to use very tiny checkpoints that were all around 10KB to 100KB. I'm not sure if that would more acceptable (or even possible for some models).

a separate repo we can use as a git submodule during testing?

Agree, that would be a good solution

Copy link
Collaborator

@CharlelieLrt CharlelieLrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants