Skip to content

Refactor shared logic between score() and get_logprobs() #1094

@RayenTian

Description

@RayenTian

Description:

For reward model env #1026
The implementation from def score(): to the referenced section is largely overlapping with get_logprobs.

To reduce duplication and improve maintainability, extract the common logic into shared utilities or a helper function.

Keep behavior unchanged; this should be a pure refactor. Add/adjust unit tests to ensure parity.

Scope:

  • nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • nemo_rl/models/policy/dtensor_policy_worker.py

Acceptance Criteria:

No functional changes; existing APIs and outputs remain identical.

score() and get_logprobs no longer duplicate core logic.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions