Agent training #299

GrutmanE · 2021-05-19T17:21:07Z

GrutmanE
May 19, 2021

One of the Plato notebooks in the Zoo (Chapter06_Cliff_Walking.jl) has a following function:

function repeated_run(α, method, N, n_episode, is_mean=true)
	env = CliffWalkingEnv()
	rewards = []
	for _ in 1:N
		h = TotalRewardPerEpisode()
		run(
			create_agent(α, method),
			env, 
			StopAfterEpisode(n_episode;is_show_progress=false),
			h
		)
		push!(rewards, is_mean ? mean(h.rewards) : h.rewards)
	end
	mean(rewards)
end

It seams repeated_runs uses create_agent (via calling a constructor) to create an identical agent in 1:N, but clearly this cannot be so. What is the trick please? In other words, how does the information is passed along?

P.S.
For completeness adding create_agent function

create_agent(α, method) = Agent(
	policy = QBasedPolicy(
		learner=TDLearner(
			approximator=TabularQApproximator(
				;n_state=NS,
				n_action=NA,
				opt=Descent(α),
			),
			method=method,
			γ=1.0,
			n=0
		),
		explorer=EpsilonGreedyExplorer(0.1)
	),
	trajectory=VectorSARTTrajectory()	
)

findmyway · 2021-05-19T19:14:21Z

findmyway
May 19, 2021
Maintainer

I'm not sure if I understand your question correctly. But the idea behind is that, after each run, the agent is changed (the learner is updated). So in the next run, we need to reset its internal parameters (the learner here). But this interface is not exposed directly. So I simply create an identical agent here. 😢 Not very efficient though.

0 replies

findmyway · 2021-05-20T00:48:33Z

findmyway
May 20, 2021
Maintainer

I'm afraid there's a missunderstanding here. The 1:N loop is to collect the performance of each independent rollout. So I don't think anything should be passed to the next iteration. 

…

------------------ Original ------------------ From: GrutmanE ***@***.***> Date: Thu,May 20,2021 6:02 AM To: JuliaReinforcementLearning/ReinforcementLearning.jl ***@***.***> Cc: Jun Tian ***@***.***>, Comment ***@***.***> Subject: Re: [JuliaReinforcementLearning/ReinforcementLearning.jl] Agent training (#299) Thanks for your reply. Let me add clarify some more. I do not see how the learner gets updated. How does the information from the run number m get passed to the run number m+1? To rephrase my question, some mutable variable in the scope of the 1:N loop in repeated_run function must be passed to the learner to make it different from the previous iteration? In case of tabular Q learning there must be a Q matrix with dimensions size(environment), size(action space) that needs to be maintained and updated. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent training #299

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Agent training #299

GrutmanE May 19, 2021

Replies: 2 comments

findmyway May 19, 2021 Maintainer

findmyway May 20, 2021 Maintainer

GrutmanE
May 19, 2021

findmyway
May 19, 2021
Maintainer

findmyway
May 20, 2021
Maintainer