BUG: Atari environments do not default to 108K frames (27K steps) per episode #1234

hexonfox · 2024-08-01T21:27:27Z

After testing the PPO algorithm across 56 Atari environments, I noticed a discrepancy with some environments. In particular, the mean rewards attained differed from mean rewards attained by the PPO implementations from Stable Baselines3 and CleanRL in nine environments. The table below shows the nine environments, where five trials was conducted for each (implementation, environment) permutation, and an environment-wise one-way ANOVA was subsequently conducted to determine the effect of implementation source on mean reward. With respect to Baselines (not the 108 variant), it is observed that the implementation means are significantly different.

In the figure below, the training curves are aggregated from five trials to indicate the minimum, maximum, and mean within the shaded regions. The y-axis represents the mean reward while the x-axis represents the number of frames (total of 40 million frames). The curves for Baselines, Stable Baselines3, and CleanRL are in purple, orange, and red respectively (the blue and green curves can be ignored). It can be observed that Baselines' curves are significantly different than the curves from CleanRL and Stable Baselines3, aligning with the table above.

After manually debugging the code, I managed to locate the inconsistency. The environment was not conforming to the ALE specification of 108K frames per episode for the v4 variant---the default variant used by this repository and most DRL Libraries (e.g., CleanRL and Stable Baselines3). After setting max_episode_steps in the make_atari function to be 27K (108K frames), the implementations were now consistent in three out of the nine environments, as seen in the table above and the figure below.

I will create a pull request which sets the default number of frames per episode to be 108K (27K steps), with minimal changes to the original codebase so that it does not affect other components. However, I believe that there might still be other inconsistencies since there are six environments that still significantly differ between the implementations. Any suggestions on the possible causes of these inconsistencies would be much appreciated. In case the pull-request is not accepted, I have also included the fix below for those wanting to train Atari environments :)

# one line change in baselines/baselines/common/atari_wrappers.py
def make_atari(env_id, max_episode_steps=27000):

Run Command To Replicate:

python -m baselines.run --alg=ppo2 --env=AtlantisNoFrameskip-v4 --seed 0 --num_timesteps 10e6 --network cnn --num_env 8

The text was updated successfully, but these errors were encountered:

hexonfox linked a pull request Aug 1, 2024 that will close this issue

default max_episode_steps to 27K (108K frames) in make_atari function #1235

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Atari environments do not default to 108K frames (27K steps) per episode #1234

BUG: Atari environments do not default to 108K frames (27K steps) per episode #1234

hexonfox commented Aug 1, 2024 •

edited

Loading

BUG: Atari environments do not default to 108K frames (27K steps) per episode #1234

BUG: Atari environments do not default to 108K frames (27K steps) per episode #1234

Comments

hexonfox commented Aug 1, 2024 • edited Loading

hexonfox commented Aug 1, 2024 •

edited

Loading