Replace step! with act! and observe #26

zsunberg · 2020-07-06T19:06:05Z

After discussion in #20, I'm now leaning toward having act! and observe to replace step! since it makes significantly more sense in a 2-player environment.

Please weigh in!

If accepted, this will

close Support for multiple players games #20
close observe(env) #12
close Functions that depend on the environment state vs functions of the environment itself #25 (by adding statements to the docstrings)

zsunberg · 2020-07-06T19:07:59Z

@jonathan-laurent I can't request a review from you since you are not a JuliaReinforcementLearning member, but I think it would be good for you to weigh in since this was prompted by #20

jonathan-laurent · 2020-07-06T19:26:10Z

@findmyway Could you please add me to JuliaReinforcementLearning?

findmyway

Looks good to me. My only concern is the name of observe.

First, apologize if my comments in #12 (comment) and JuliaReinforcementLearning/ReinforcementLearning.jl#68 (comment) caused any confusions there. Inspired by all your discussions, I'm experimenting with a wild idea to remove observe(env) (at least in single agent environment). So let's forget those discussions temporarily.

From the example here, the observe function returns a state. But I guess it is assumed to return an observation in POMDP environments? Then how do we get the inner state in such environments? (With a state(env)?) And I think @jonathan-laurent mentioned another two functions set_state!(env, state) or Env(state) somewhere. So I think people with different backgrounds may have a different understanding of state or observation. We'd better to clearly define these concepts explicitly somewhere in the doc.

findmyway · 2020-07-07T02:12:10Z

@findmyway Could you please add me to JuliaReinforcementLearning?

It seems you are already in 😉

jonathan-laurent · 2020-07-07T02:59:18Z

README.md

 actions(env)    # returns the set of all possible actions for the environment
+act!(env, a)    # returns a reward, done, and info


Should we consider only returning the reward or is this for a next PR?

Right - I'll change it depending on the outcome of #20

jonathan-laurent · 2020-07-07T03:02:53Z

src/CommonRLInterface.jl

+
+Return a collection of all the actions available to player i.
+
+This function is a *static property* of the environment; the value it returns should not change based on the state.


Maybe we should insist that valid_action_mask must be defined with respect to actions(env) and not actions(env, player(env)).

Also, is there a way to make implementing actions(env) mandatory but actions(env, player) optional?

Yes, currently actions(env) is mandatory, but actions(env, player) is optional.

zsunberg · 2020-07-08T18:45:32Z

We'd better to clearly define these concepts explicitly somewhere in the doc.

Yes. Documentation #24 will be important after we get the rest of the basic questions about the interface answered.

Informally, the definitions are as follows:
observe returns all of the information given to the agent on that step to make decisions.
state returns the Markov state of the environment. The behavior of two environments with the same static parameters that start from the same state should be statistically identical.

We may want a trait that indicates whether the environment is fully observable, in which case observe and state return the same information.

fix #20 fix #25 fix #12

e945254

zsunberg requested a review from findmyway July 6, 2020 19:06

jonathan-laurent self-requested a review July 6, 2020 23:38

findmyway reviewed Jul 7, 2020

View reviewed changes

jonathan-laurent approved these changes Jul 7, 2020

View reviewed changes

zsunberg changed the title ~~Replace step! with act! and observe!~~ Replace step! with act! and observe Jul 8, 2020

zsunberg mentioned this pull request Jul 14, 2020

Documenter.jl docs #24

Open

fix #20 with r = act!(env, a)

ca68d95

zsunberg merged commit 50f4e52 into master Jul 15, 2020

zsunberg deleted the act-observe branch July 15, 2020 01:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace step! with act! and observe #26

Replace step! with act! and observe #26

zsunberg commented Jul 6, 2020

zsunberg commented Jul 6, 2020

jonathan-laurent commented Jul 6, 2020

findmyway left a comment

findmyway commented Jul 7, 2020

jonathan-laurent Jul 7, 2020

zsunberg Jul 8, 2020

jonathan-laurent Jul 7, 2020

zsunberg Jul 8, 2020

zsunberg commented Jul 8, 2020

		actions(env) # returns the set of all possible actions for the environment
		act!(env, a) # returns a reward, done, and info


		Return a collection of all the actions available to player i.

		This function is a static property of the environment; the value it returns should not change based on the state.

Replace step! with act! and observe #26

Replace step! with act! and observe #26

Conversation

zsunberg commented Jul 6, 2020

zsunberg commented Jul 6, 2020

jonathan-laurent commented Jul 6, 2020

findmyway left a comment

Choose a reason for hiding this comment

findmyway commented Jul 7, 2020

jonathan-laurent Jul 7, 2020

Choose a reason for hiding this comment

zsunberg Jul 8, 2020

Choose a reason for hiding this comment

jonathan-laurent Jul 7, 2020

Choose a reason for hiding this comment

zsunberg Jul 8, 2020

Choose a reason for hiding this comment

zsunberg commented Jul 8, 2020