-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Fix Multi-Agent Episode concatenation for sequential environments #59895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Multi-Agent Episode concatenation for sequential environments #59895
Conversation
Signed-off-by: Mark Towers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an AssertionError during multi-agent episode concatenation by removing an unnecessary add_env_step call. The new approach of verifying the consistency of the hanging action with the next episode's initial action is sound.
I've identified a couple of issues:
- A debug
printstatement has been left in the code. - The action comparison logic is not robust for numpy arrays or nested structures, which could lead to runtime errors.
I've provided suggestions to fix these. Also, as noted in the PR description, adding tests to cover this fix would be crucial to prevent future regressions.
Signed-off-by: Mark Towers <[email protected]>
simonsays1980
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this @pseudo-rnd-thoughts ! Small things: could you raise an issue for the slice method to be fixed in the future? And it needs a small change in the docstring - did not came from your change though
| In order for this to work, both chunks (`self` and `other`) must fit | ||
| together. This is checked by the IDs (must be identical), the time step counters | ||
| together that are split through `cut`. For sequential multi-agent environments | ||
| using slice might cause problems from hanging observation/actions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something we need to fix in the near future. Could you raise another issue on Ray OSS please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is a bug or an inherent limitation of the slice method
Signed-off-by: Mark Towers <[email protected]>
simonsays1980
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @pseudo-rnd-thoughts !
…y-project#59895) ## Description For ray-project#59508 and ray-project#59581 that using TicTacToe would cause the following error ``` ray::DQN.train() (pid=88183, ip=127.0.0.1, actor_id=2b775f13e808cc4aaaa23bde01000000, repr=DQN(env=<class 'ray.rllib.examples.envs.classes.multi_agent.tic_tac_toe.TicTacToe'>; env-runners=0; learners=0; multi-agent=True)) File "ray/python/ray/tune/trainable/trainable.py", line 331, in train raise skipped from exception_cause(skipped) File "ray/python/ray/tune/trainable/trainable.py", line 328, in train result = self.step() File "ray/python/ray/rllib/algorithms/algorithm.py", line 1242, in step train_results, train_iter_ctx = self._run_one_training_iteration() File "ray/python/ray/rllib/algorithms/algorithm.py", line 3666, in _run_one_training_iteration training_step_return_value = self.training_step() File "ray/python/ray/rllib/algorithms/dqn/dqn.py", line 646, in training_step return self._training_step_new_api_stack() File "ray/python/ray/rllib/algorithms/dqn/dqn.py", line 668, in _training_step_new_api_stack self.local_replay_buffer.add(episodes) File "ray/python/ray/rllib/utils/replay_buffers/prioritized_episode_buffer.py", line 314, in add existing_eps.concat_episode(eps) File "ray/python/ray/rllib/env/multi_agent_episode.py", line 862, in concat_episode sa_episode.concat_episode(other.agent_episodes[agent_id]) File "ray/python/ray/rllib/env/single_agent_episode.py", line 618, in concat_episode assert self.t == other.t_started AssertionError ``` In the multi-agent-episode `concat_episode`, we check if any agent hasn't received their next observation from an observation, action. This results in a hanging action where in one episode is the observation, action then in the next is the resulting observation, reward, etc. This [code](https://github.com/ray-project/ray/blob/22cf6ef6af2cddc233bca7ce59668ed8f4bbb17e/rllib/env/multi_agent_episode.py#L848) check if this has happened then added an extra step at the beginning to include this hanging data. However, in testing, the multi-agent episode `cut` method already implements this (if using `slice` this will cause a hidden bug) meaning that an extra unnecessary step's data is being added resulting in the episode beginnings not lining up. Therefore, this PR removes this code and replaces with a simple check to assume that the hanging action is equivalent to the initial action in the next episode. For testing, I found that the `concat_episode` test was using `slice` which doesn't account for hanging data while `cut` which is used in the env-runner does. I modified the test to be more functional based where I created a custom environment that has agents taking actions at different frequencies then returning as an observation the agent's timestep. This allows us to test through concatenating all episodes of the same ID and checking that the observations increase 0, 1, 2, ... and ensures that no data goes missing for users. --------- Signed-off-by: Mark Towers <[email protected]> Co-authored-by: Mark Towers <[email protected]> Signed-off-by: jasonwrwang <[email protected]>
Description
For #59508 and #59581 that using TicTacToe would cause the following error
In the multi-agent-episode
concat_episode, we check if any agent hasn't received their next observation from an observation, action. This results in a hanging action where in one episode is the observation, action then in the next is the resulting observation, reward, etc. This code check if this has happened then added an extra step at the beginning to include this hanging data.However, in testing, the multi-agent episode
cutmethod already implements this (if usingslicethis will cause a hidden bug) meaning that an extra unnecessary step's data is being added resulting in the episode beginnings not lining up.Therefore, this PR removes this code and replaces with a simple check to assume that the hanging action is equivalent to the initial action in the next episode.
For testing, I found that the
concat_episodetest was usingslicewhich doesn't account for hanging data whilecutwhich is used in the env-runner does. I modified the test to be more functional based where I created a custom environment that has agents taking actions at different frequencies then returning as an observation the agent's timestep. This allows us to test through concatenating all episodes of the same ID and checking that the observations increase 0, 1, 2, ... and ensures that no data goes missing for users.