Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace step! with act! and observe #26

Merged
merged 2 commits into from
Jul 15, 2020
Merged

Replace step! with act! and observe #26

merged 2 commits into from
Jul 15, 2020

Conversation

zsunberg
Copy link
Member

@zsunberg zsunberg commented Jul 6, 2020

After discussion in #20, I'm now leaning toward having act! and observe to replace step! since it makes significantly more sense in a 2-player environment.

Please weigh in!

If accepted, this will

@zsunberg zsunberg requested a review from findmyway July 6, 2020 19:06
@zsunberg
Copy link
Member Author

zsunberg commented Jul 6, 2020

@jonathan-laurent I can't request a review from you since you are not a JuliaReinforcementLearning member, but I think it would be good for you to weigh in since this was prompted by #20

@jonathan-laurent
Copy link

@findmyway Could you please add me to JuliaReinforcementLearning?

@jonathan-laurent jonathan-laurent self-requested a review July 6, 2020 23:38
Copy link
Member

@findmyway findmyway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. My only concern is the name of observe.

First, apologize if my comments in #12 (comment) and JuliaReinforcementLearning/ReinforcementLearning.jl#68 (comment) caused any confusions there. Inspired by all your discussions, I'm experimenting with a wild idea to remove observe(env) (at least in single agent environment). So let's forget those discussions temporarily.

From the example here, the observe function returns a state. But I guess it is assumed to return an observation in POMDP environments? Then how do we get the inner state in such environments? (With a state(env)?) And I think @jonathan-laurent mentioned another two functions set_state!(env, state) or Env(state) somewhere. So I think people with different backgrounds may have a different understanding of state or observation. We'd better to clearly define these concepts explicitly somewhere in the doc.

@findmyway
Copy link
Member

@findmyway Could you please add me to JuliaReinforcementLearning?

It seems you are already in 😉

README.md Outdated
actions(env) # returns the set of all possible actions for the environment
act!(env, a) # returns a reward, done, and info

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider only returning the reward or is this for a next PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - I'll change it depending on the outcome of #20


Return a collection of all the actions available to player i.

This function is a *static property* of the environment; the value it returns should not change based on the state.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should insist that valid_action_mask must be defined with respect to actions(env) and not actions(env, player(env)).

Also, is there a way to make implementing actions(env) mandatory but actions(env, player) optional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently actions(env) is mandatory, but actions(env, player) is optional.

@zsunberg zsunberg changed the title Replace step! with act! and observe! Replace step! with act! and observe Jul 8, 2020
@zsunberg
Copy link
Member Author

zsunberg commented Jul 8, 2020

We'd better to clearly define these concepts explicitly somewhere in the doc.

Yes. Documentation #24 will be important after we get the rest of the basic questions about the interface answered.

Informally, the definitions are as follows:
observe returns all of the information given to the agent on that step to make decisions.
state returns the Markov state of the environment. The behavior of two environments with the same static parameters that start from the same state should be statistically identical.

We may want a trait that indicates whether the environment is fully observable, in which case observe and state return the same information.

@zsunberg zsunberg mentioned this pull request Jul 14, 2020
@zsunberg zsunberg merged commit 50f4e52 into master Jul 15, 2020
@zsunberg zsunberg deleted the act-observe branch July 15, 2020 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants