Refactor shared logic between score() and get_logprobs()

## Description:

For reward model env https://github.com/NVIDIA-NeMo/RL/pull/1026
The implementation from def score(): to the referenced section is largely overlapping with get_logprobs.

To reduce duplication and improve maintainability, extract the common logic into shared utilities or a helper function.

Keep behavior unchanged; this should be a pure refactor. Add/adjust unit tests to ensure parity.

## Scope:

- nemo_rl/models/policy/dtensor_policy_worker_v2.py
- nemo_rl/models/policy/dtensor_policy_worker.py

## Acceptance Criteria:

No functional changes; existing APIs and outputs remain identical.

score() and get_logprobs no longer duplicate core logic.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor shared logic between score() and get_logprobs() #1094

Description:

Scope:

Acceptance Criteria:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor shared logic between score() and get_logprobs() #1094

Description

Description:

Scope:

Acceptance Criteria:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions