You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implementation of the ACERAC algorithm (Actor-Critic with Experience Replay and Autocorrelated Action). Will include implementation of replay buffer that supports returning of n-step trajectories, as it is required by the ACERAC algorithm.
Paper with the algorithm is avaliable here (Open Access)
Motivation
ACERAC is an off-policy Actor-Critic algorithm with well-adjustable hyperparameters for fine-time discretization and good results on PyBullet robotic environments.
Pitch
I will implement this feature by myself, if approved.
Alternatives
Original implementation is available here
However, I believe it would be easier for potential users if this algorithm was part of SB3 suite with unified interface.
Additional context
No response
Checklist
I have checked that there is no similar issue in the repo
If I'm requesting a new feature, I have proposed alternatives
The text was updated successfully, but these errors were encountered:
ACERAC is an algorithm designed to perform well in fine-time discretization environments.
Fine-time discretization environments include environments where a single time step corresponds to relatively short part of the whole MDP. Such environments include robotic control environments with high control frequency.
The research presented in the ACERAC paper uses PyBullet robotic environments (Ant, HalfCheetah, Hopper, Walker2D) with 3 and 10 times increased control frequency for experiments in such setting.
"Making deep q-learning methods robust to time discretization," by C. Tallec, L. Blier, and Y. Ollivier describes difficulties in using common RL algorithms in such environments. To summarize:
Action-value function degrades, to value function as control frequency increases, as each action becomes shorter and less significant
Structured exploration such as action autocorrelation is required to enable efficient exploration, as unstructured action noise may get filtered by momentum of the underlying system.
Experimental results in the ACERAC paper further suggest that using n-step return estimation is beneficial in such environments.
🚀 Feature
Implementation of the ACERAC algorithm (Actor-Critic with Experience Replay and Autocorrelated Action). Will include implementation of replay buffer that supports returning of n-step trajectories, as it is required by the ACERAC algorithm.
Paper with the algorithm is avaliable here (Open Access)
Motivation
ACERAC is an off-policy Actor-Critic algorithm with well-adjustable hyperparameters for fine-time discretization and good results on PyBullet robotic environments.
Pitch
I will implement this feature by myself, if approved.
Alternatives
Original implementation is available here
However, I believe it would be easier for potential users if this algorithm was part of SB3 suite with unified interface.
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: