[OSPP-Week3] fix the problems in the `NFSP` implementation #386

peterchen96 · 2021-07-19T07:54:43Z

peterchen96
Jul 19, 2021
Collaborator

Thanks to @findmyway for pointing out the problems existing in the current NFSP implementation. I'll list them as the following and fix them sequentially. #375

use reservoir_trajetory to collect data for sl_agent: Supplement functions in ReservoirTrajectory and BehaviorCloningPolicy #390
CircularArrayBuffer may not suitable for sl_agent, and I should use the reservoir_trajectory which will randomly replace an old with a new element when the buffer capacity is full.
replace average_learner with BehaviourCloningPolicy: Supplement functions in ReservoirTrajectory and BehaviorCloningPolicy #390
average_learner looks similar to BehaviorCloningPolicy. Also, sl_agent just needs to collect states and actions rather than SARTS. Maybe I just need to supplement some BehaviorCloningPolicy's functions is enough to use for sl_agent.
other problems about the convenience of reusing: Implementation of NFSP and NFSP_KuhnPoker experiment #402 (in progress)
the work of state encodes can move to the specific env file.
- encode the state by using its index in the state_space.
check the run function for NFSPAgentManager, including assertations about the available environment.
- check the game whether is Multiagent and Imperfect for now.
modify the experiment file, including design a suitable hook and correct format errors.
- redesign the hook(ResultNEpisode) to record the result.
- however, remain some error in the experiment file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OSPP-Week3] fix the problems in the `NFSP` implementation #386

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

[OSPP-Week3] fix the problems in the NFSP implementation #386

peterchen96 Jul 19, 2021 Collaborator

Replies: 0 comments

[OSPP-Week3] fix the problems in the `NFSP` implementation #386

peterchen96
Jul 19, 2021
Collaborator