[OSPP-Week3] fix the problems in the NFSP
implementation
#386
peterchen96
announced in
Archive
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Thanks to @findmyway for pointing out the problems existing in the current
NFSP
implementation. I'll list them as the following and fix them sequentially. #375use reservoir_trajetory to collect data for sl_agent: Supplement functions in ReservoirTrajectory and BehaviorCloningPolicy #390
CircularArrayBuffer may not suitable for sl_agent, and I should use the reservoir_trajectory which will randomly replace an old with a new element when the buffer capacity is full.
replace average_learner with BehaviourCloningPolicy: Supplement functions in ReservoirTrajectory and BehaviorCloningPolicy #390
average_learner looks similar to BehaviorCloningPolicy. Also, sl_agent just needs to collect states and actions rather than SARTS. Maybe I just need to supplement some BehaviorCloningPolicy's functions is enough to use for sl_agent.
other problems about the convenience of reusing: Implementation of NFSP and NFSP_KuhnPoker experiment #402 (in progress)
the work of state encodes can move to the specific env file.
state_space
.check the
run
function forNFSPAgentManager
, including assertations about the available environment.modify the experiment file, including design a suitable hook and correct format errors.
Beta Was this translation helpful? Give feedback.
All reactions