-
Notifications
You must be signed in to change notification settings - Fork 53
Place conditions within States #445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
josephdviviano
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @hyeok9855
I actually like this. I'm curious to hear @younik 's feedback on this design.
It does not seem to influence the make_States_class factory, which was my main concern. Conditions live inside states which keeps APIs relatively clean, and conceptually having them live together makes sense to me.
What I don't love is having a distinct ConditionalEnv, as in the parent PR, because I still think that Env definition is the most confusing element of our library and I'd like to reduce that complexity as much as possible, but I think it would be worth a longer discussion.
Want to break it down on slack or schedule a call?
| A tensor of shape (batch_size,) containing the log rewards. | ||
| A tensor of shape (batch_size,) containing the rewards. | ||
| """ | ||
| return torch.log(self.reward(states, conditions)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we leave this in as a default? And why not log_reward like everything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reward method can be removed actually, since they are exactly same as the one in the parent class. I will remove it.
| # LogF is potentially a conditional computation. | ||
| if transitions.conditions is not None: | ||
| if transitions.states.has_conditions: | ||
| assert transitions.states.conditions is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this kind of thing, btw, will break torch.compile. (is not None). No action needed, I'll look at this in a different PR.
I also prefer this design, it looks cleaner, even if as far as I understood the conditioning cannot change during an episode. |
Description
This PR tries to relocate the conditions from elsewhere to inside the
States. There're pros and cons of this design.Pros
env.reward(states)instead ofenv.reward(states, conditions)StatesCons
(max_length, batch_size, condition_dim), i.e., we now need to storemax_lengthcopy of conditions (possibly increase memory overhead)Statesis now a bit messyStates