Skip to content

Conversation

@hyeok9855
Copy link
Collaborator

  • I've read the .github/CONTRIBUTING.md file
  • My code follows the typing guidelines
  • I've added appropriate tests
  • I've run pre-commit hooks locally

Description

This PR tries to relocate the conditions from elsewhere to inside the States. There're pros and cons of this design.

Pros

  • Cleaner API, e.g., we can simply use env.reward(states) instead of env.reward(states, conditions)
  • Trajectory management becomes easier since conditions are handled in States

Cons

  • Redundancy; the shape of condition tensor for batch of trajectories is (max_length, batch_size, condition_dim), i.e., we now need to store max_length copy of conditions (possibly increase memory overhead)
  • States is now a bit messy
  • Less intuitive (personally); conditions can exist without States

@hyeok9855 hyeok9855 self-assigned this Dec 5, 2025
Copy link
Collaborator

@josephdviviano josephdviviano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @hyeok9855

I actually like this. I'm curious to hear @younik 's feedback on this design.

It does not seem to influence the make_States_class factory, which was my main concern. Conditions live inside states which keeps APIs relatively clean, and conceptually having them live together makes sense to me.

What I don't love is having a distinct ConditionalEnv, as in the parent PR, because I still think that Env definition is the most confusing element of our library and I'd like to reduce that complexity as much as possible, but I think it would be worth a longer discussion.

Want to break it down on slack or schedule a call?

A tensor of shape (batch_size,) containing the log rewards.
A tensor of shape (batch_size,) containing the rewards.
"""
return torch.log(self.reward(states, conditions))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we leave this in as a default? And why not log_reward like everything else?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reward method can be removed actually, since they are exactly same as the one in the parent class. I will remove it.

# LogF is potentially a conditional computation.
if transitions.conditions is not None:
if transitions.states.has_conditions:
assert transitions.states.conditions is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this kind of thing, btw, will break torch.compile. (is not None). No action needed, I'll look at this in a different PR.

@younik
Copy link
Collaborator

younik commented Dec 12, 2025

I actually like this. I'm curious to hear @younik 's feedback on this design.

I also prefer this design, it looks cleaner, even if as far as I understood the conditioning cannot change during an episode.

@hyeok9855 hyeok9855 merged commit d000cbd into refactor-conditions Dec 12, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants