-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace step! with act! and observe #26
Conversation
@jonathan-laurent I can't request a review from you since you are not a JuliaReinforcementLearning member, but I think it would be good for you to weigh in since this was prompted by #20 |
@findmyway Could you please add me to JuliaReinforcementLearning? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. My only concern is the name of observe
.
First, apologize if my comments in #12 (comment) and JuliaReinforcementLearning/ReinforcementLearning.jl#68 (comment) caused any confusions there. Inspired by all your discussions, I'm experimenting with a wild idea to remove observe(env)
(at least in single agent environment). So let's forget those discussions temporarily.
From the example here, the observe
function returns a state. But I guess it is assumed to return an observation
in POMDP environments? Then how do we get the inner state in such environments? (With a state(env)
?) And I think @jonathan-laurent mentioned another two functions set_state!(env, state)
or Env(state)
somewhere. So I think people with different backgrounds may have a different understanding of state
or observation
. We'd better to clearly define these concepts explicitly somewhere in the doc.
It seems you are already in 😉 |
README.md
Outdated
actions(env) # returns the set of all possible actions for the environment | ||
act!(env, a) # returns a reward, done, and info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider only returning the reward or is this for a next PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - I'll change it depending on the outcome of #20
|
||
Return a collection of all the actions available to player i. | ||
|
||
This function is a *static property* of the environment; the value it returns should not change based on the state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should insist that valid_action_mask
must be defined with respect to actions(env)
and not actions(env, player(env))
.
Also, is there a way to make implementing actions(env)
mandatory but actions(env, player)
optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, currently actions(env)
is mandatory, but actions(env, player)
is optional.
Yes. Documentation #24 will be important after we get the rest of the basic questions about the interface answered. Informally, the definitions are as follows: We may want a trait that indicates whether the environment is fully observable, in which case |
After discussion in #20, I'm now leaning toward having
act!
andobserve
to replacestep!
since it makes significantly more sense in a 2-player environment.Please weigh in!
If accepted, this will