-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues with SARTSATraces #72
Fix issues with SARTSATraces #72
Conversation
Bug 1 - When pushing more traces into CircularArraySARTSATraces than its capacity, the state and action traces are not in line anymore Bug 2 - sampleable_inds were not correct for CircularArraySARTSATraces Bug 3 - CircularArraySARTSATraces were not sampleable by a EpisodesSampler
Bug 1 - When pushing more traces into CircularArraySARTSATraces than its capacity, the state and action traces are not in line anymore Bug 2 - sampleable_inds were not correct for CircularArraySARTSATraces Bug 3 - CircularArraySARTSATraces were not sampleable by a EpisodesSampler
@dharux Thank you for the PR. I've updated the main branch to eliminate unrelated test failures. Would you able to resolve the remaining test errors (in |
I could take a shot at it. It might require completely rewriting how tuples are pushed into a |
The usage of SARTSA traces is more restrictive and should be done in this way
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #72 +/- ##
==========================================
- Coverage 74.65% 74.23% -0.43%
==========================================
Files 15 18 +3
Lines 801 850 +49
==========================================
+ Hits 598 631 +33
- Misses 203 219 +16 ☔ View full report in Codecov by Sentry. |
Needed some final fixes. Everything should be working now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time. Sampleable indices are not very intuitive, I hope this didn't give you headaches.
de01fb3
into
JuliaReinforcementLearning:main
This fixes #70 and fixes #71.
Issue #70 : SARTSA Traces cannot be sampled correctly.
The sampleable_inds field of the episodes buffer containing a SARTSATrace (CircularArraySARTSATraces or ElasticArraySARTSATraces) was not correctly keeping track of which indices are sampleable. In a SARTSATraces, information from 3 steps are required to complete a single trace. The initial step of the episode in which only the state is pushed. The next step in which the action and next_state are pushed. The following step in which the next_action is pushed. Now, there is one sampleable index, but two unsampleable ones. Thus, the last two indices in the trace are typically unsampleable during the episode. This change was made to the function
Base.push!(eb::EpisodesBuffer, xs::NamedTuple)
in episodes.jl.Issue #71 : Action and state go out of sync in CircularArraySARTSATraces
When more traces are pushed into the Traces than its capacity, the state does not match the appropriate action. To fix this, the capacity of the state trace should be one more than that of the action trace. The capacity of all traces are incremented by 1 so that the Traces can hold
capacity
amount of full traces.The tests were also modified to check that it works correctly with the following usage:
PreEpisodeStage
in RL.jl)PostActStage
)PostEpisodeStage
)This is the behaviour of the agent from RLCore.jl and all trajectories should work well with that. The
CircularArraySARTSATraces
are a bit restrictive with regards to usage and cannot be used very differently than the typical usage above. I have not been able to make it work while represented next_action and working with all general usage.The
CircularArrarySARTSATraces
do not currently work withCircularPrioritizedTraces
as these add additional keys likekeys
andpriorities
which are updated outside of the Traces. Some changes need to be made withCircularPrioritizedTraces
to make it work withCircularArrarySARTSATraces
also. Thus, there are 2 errors during the testing involvingCircularPrioritizedTraces
.