[Feature Request] ACERAC #267

lychanl · 2024-12-05T10:48:43Z

🚀 Feature

Implementation of the ACERAC algorithm (Actor-Critic with Experience Replay and Autocorrelated Action). Will include implementation of replay buffer that supports returning of n-step trajectories, as it is required by the ACERAC algorithm.

Paper with the algorithm is avaliable here (Open Access)

Motivation

ACERAC is an off-policy Actor-Critic algorithm with well-adjustable hyperparameters for fine-time discretization and good results on PyBullet robotic environments.

Pitch

I will implement this feature by myself, if approved.

Alternatives

Original implementation is available here
However, I believe it would be easier for potential users if this algorithm was part of SB3 suite with unified interface.

Additional context

No response

Checklist

I have checked that there is no similar issue in the repo
If I'm requesting a new feature, I have proposed alternatives

araffin · 2024-12-11T11:27:59Z

hello,

fine-time discretization

Could you give a quick example/short explanation of what exact problem it solves that it not solved by other methods?

lychanl · 2024-12-13T13:55:06Z

ACERAC is an algorithm designed to perform well in fine-time discretization environments.

Fine-time discretization environments include environments where a single time step corresponds to relatively short part of the whole MDP. Such environments include robotic control environments with high control frequency.

The research presented in the ACERAC paper uses PyBullet robotic environments (Ant, HalfCheetah, Hopper, Walker2D) with 3 and 10 times increased control frequency for experiments in such setting.

"Making deep q-learning methods robust to time discretization," by C. Tallec, L. Blier, and Y. Ollivier describes difficulties in using common RL algorithms in such environments. To summarize:

Action-value function degrades, to value function as control frequency increases, as each action becomes shorter and less significant
Structured exploration such as action autocorrelation is required to enable efficient exploration, as unstructured action noise may get filtered by momentum of the underlying system.

Experimental results in the ACERAC paper further suggest that using n-step return estimation is beneficial in such environments.

lychanl added the enhancement New feature or request label Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] ACERAC #267

[Feature Request] ACERAC #267

lychanl commented Dec 5, 2024

araffin commented Dec 11, 2024

lychanl commented Dec 13, 2024

[Feature Request] ACERAC #267

[Feature Request] ACERAC #267

Comments

lychanl commented Dec 5, 2024

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Checklist

araffin commented Dec 11, 2024

lychanl commented Dec 13, 2024