Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] ACERAC #267

Open
2 tasks done
lychanl opened this issue Dec 5, 2024 · 2 comments
Open
2 tasks done

[Feature Request] ACERAC #267

lychanl opened this issue Dec 5, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@lychanl
Copy link

lychanl commented Dec 5, 2024

🚀 Feature

Implementation of the ACERAC algorithm (Actor-Critic with Experience Replay and Autocorrelated Action). Will include implementation of replay buffer that supports returning of n-step trajectories, as it is required by the ACERAC algorithm.

Paper with the algorithm is avaliable here (Open Access)

Motivation

ACERAC is an off-policy Actor-Critic algorithm with well-adjustable hyperparameters for fine-time discretization and good results on PyBullet robotic environments.

Pitch

I will implement this feature by myself, if approved.

Alternatives

Original implementation is available here
However, I believe it would be easier for potential users if this algorithm was part of SB3 suite with unified interface.

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo
  • If I'm requesting a new feature, I have proposed alternatives
@lychanl lychanl added the enhancement New feature or request label Dec 5, 2024
@araffin
Copy link
Member

araffin commented Dec 11, 2024

hello,

fine-time discretization

Could you give a quick example/short explanation of what exact problem it solves that it not solved by other methods?

@lychanl
Copy link
Author

lychanl commented Dec 13, 2024

ACERAC is an algorithm designed to perform well in fine-time discretization environments.

Fine-time discretization environments include environments where a single time step corresponds to relatively short part of the whole MDP. Such environments include robotic control environments with high control frequency.

The research presented in the ACERAC paper uses PyBullet robotic environments (Ant, HalfCheetah, Hopper, Walker2D) with 3 and 10 times increased control frequency for experiments in such setting.

"Making deep q-learning methods robust to time discretization," by C. Tallec, L. Blier, and Y. Ollivier describes difficulties in using common RL algorithms in such environments. To summarize:

  • Action-value function degrades, to value function as control frequency increases, as each action becomes shorter and less significant
  • Structured exploration such as action autocorrelation is required to enable efficient exploration, as unstructured action noise may get filtered by momentum of the underlying system.

Experimental results in the ACERAC paper further suggest that using n-step return estimation is beneficial in such environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants