Replies: 2 comments 6 replies
-
Nice points.
Sometimes, the trajectory is too large to load it all into the memory. So we might not be able to sample them in memory. You may consider reservoir sampling here.
This looks trivial since we can simply add a custom hook. Do you have any concrete plan for what you'd like to implement the next week? |
Beta Was this translation helpful? Give feedback.
-
I have some ideas:
|
Beta Was this translation helpful? Give feedback.
-
@findmyway @pilgrimygy
These are the features that I am planning on for the pipeline.
A lot of inspiration for the features has been taken from d3rlpy
Features such as a utility function for CQL loss aren't included in this list because these will be developed as more algorithms are added.
The main _run function is pretty simple since the algorithm doesn't need to interact with the environment. The approach that I see is to sample using a batch sampler for every loop and train based on the sample. Another vital component is an evaluator integrated into the _run function. The other functionalities are more related to making the package approachable and to add more features.
These are just features that I thought would be important. Feedback about the proposed ideas or any other features that can be added are welcome.
Credits: https://github.com/takuseno/d3rlpy
Beta Was this translation helpful? Give feedback.
All reactions