Fix Multi-Agent Episode concatenation for sequential environments #59895

pseudo-rnd-thoughts · 2026-01-06T12:32:24Z

Description

For #59508 and #59581 that using TicTacToe would cause the following error

ray::DQN.train() (pid=88183, ip=127.0.0.1, actor_id=2b775f13e808cc4aaaa23bde01000000, repr=DQN(env=<class 'ray.rllib.examples.envs.classes.multi_agent.tic_tac_toe.TicTacToe'>; env-runners=0; learners=0; multi-agent=True))
  File "ray/python/ray/tune/trainable/trainable.py", line 331, in train
    raise skipped from exception_cause(skipped)
  File "ray/python/ray/tune/trainable/trainable.py", line 328, in train
    result = self.step()
  File "ray/python/ray/rllib/algorithms/algorithm.py", line 1242, in step
    train_results, train_iter_ctx = self._run_one_training_iteration()
  File "ray/python/ray/rllib/algorithms/algorithm.py", line 3666, in _run_one_training_iteration
    training_step_return_value = self.training_step()
  File "ray/python/ray/rllib/algorithms/dqn/dqn.py", line 646, in training_step
    return self._training_step_new_api_stack()
  File "ray/python/ray/rllib/algorithms/dqn/dqn.py", line 668, in _training_step_new_api_stack
    self.local_replay_buffer.add(episodes)
  File "ray/python/ray/rllib/utils/replay_buffers/prioritized_episode_buffer.py", line 314, in add
    existing_eps.concat_episode(eps)
  File "ray/python/ray/rllib/env/multi_agent_episode.py", line 862, in concat_episode
    sa_episode.concat_episode(other.agent_episodes[agent_id])
  File "ray/python/ray/rllib/env/single_agent_episode.py", line 618, in concat_episode
    assert self.t == other.t_started
AssertionError

In the multi-agent-episode concat_episode, we check if any agent hasn't received their next observation from an observation, action. This results in a hanging action where in one episode is the observation, action then in the next is the resulting observation, reward, etc. This code check if this has happened then added an extra step at the beginning to include this hanging data.

However, in testing, the multi-agent episode cut method already implements this (if using slice this will cause a hidden bug) meaning that an extra unnecessary step's data is being added resulting in the episode beginnings not lining up.
Therefore, this PR removes this code and replaces with a simple check to assume that the hanging action is equivalent to the initial action in the next episode.

For testing, I found that the concat_episode test was using slice which doesn't account for hanging data while cut which is used in the env-runner does. I modified the test to be more functional based where I created a custom environment that has agents taking actions at different frequencies then returning as an observation the agent's timestep. This allows us to test through concatenating all episodes of the same ID and checking that the observations increase 0, 1, 2, ... and ensures that no data goes missing for users.

Signed-off-by: Mark Towers <[email protected]>

gemini-code-assist

Code Review

This pull request addresses an AssertionError during multi-agent episode concatenation by removing an unnecessary add_env_step call. The new approach of verifying the consistency of the hanging action with the next episode's initial action is sound.

I've identified a couple of issues:

A debug print statement has been left in the code.
The action comparison logic is not robust for numpy arrays or nested structures, which could lead to runtime errors.

I've provided suggestions to fix these. Also, as noted in the PR description, adding tests to cover this fix would be crucial to prevent future regressions.

rllib/env/multi_agent_episode.py

Signed-off-by: Mark Towers <[email protected]>

simonsays1980

Thanks for fixing this @pseudo-rnd-thoughts ! Small things: could you raise an issue for the slice method to be fixed in the future? And it needs a small change in the docstring - did not came from your change though

rllib/connectors/common/numpy_to_tensor.py

rllib/env/tests/test_multi_agent_episode.py

simonsays1980 · 2026-01-08T11:15:53Z

rllib/env/multi_agent_episode.py

        In order for this to work, both chunks (`self` and `other`) must fit
-        together. This is checked by the IDs (must be identical), the time step counters
+        together that are split through `cut`. For sequential multi-agent environments
+        using slice might cause problems from hanging observation/actions.


This is something we need to fix in the near future. Could you raise another issue on Ray OSS please?

I'm not sure if this is a bug or an inherent limitation of the slice method

rllib/env/multi_agent_episode.py

Signed-off-by: Mark Towers <[email protected]>

rllib/env/multi_agent_episode.py

simonsays1980

LGTM. Thanks @pseudo-rnd-thoughts !

rllib/env/single_agent_episode.py

…y-project#59895) ## Description For ray-project#59508 and ray-project#59581 that using TicTacToe would cause the following error ``` ray::DQN.train() (pid=88183, ip=127.0.0.1, actor_id=2b775f13e808cc4aaaa23bde01000000, repr=DQN(env=<class 'ray.rllib.examples.envs.classes.multi_agent.tic_tac_toe.TicTacToe'>; env-runners=0; learners=0; multi-agent=True)) File "ray/python/ray/tune/trainable/trainable.py", line 331, in train raise skipped from exception_cause(skipped) File "ray/python/ray/tune/trainable/trainable.py", line 328, in train result = self.step() File "ray/python/ray/rllib/algorithms/algorithm.py", line 1242, in step train_results, train_iter_ctx = self._run_one_training_iteration() File "ray/python/ray/rllib/algorithms/algorithm.py", line 3666, in _run_one_training_iteration training_step_return_value = self.training_step() File "ray/python/ray/rllib/algorithms/dqn/dqn.py", line 646, in training_step return self._training_step_new_api_stack() File "ray/python/ray/rllib/algorithms/dqn/dqn.py", line 668, in _training_step_new_api_stack self.local_replay_buffer.add(episodes) File "ray/python/ray/rllib/utils/replay_buffers/prioritized_episode_buffer.py", line 314, in add existing_eps.concat_episode(eps) File "ray/python/ray/rllib/env/multi_agent_episode.py", line 862, in concat_episode sa_episode.concat_episode(other.agent_episodes[agent_id]) File "ray/python/ray/rllib/env/single_agent_episode.py", line 618, in concat_episode assert self.t == other.t_started AssertionError ``` In the multi-agent-episode `concat_episode`, we check if any agent hasn't received their next observation from an observation, action. This results in a hanging action where in one episode is the observation, action then in the next is the resulting observation, reward, etc. This [code](https://github.com/ray-project/ray/blob/22cf6ef6af2cddc233bca7ce59668ed8f4bbb17e/rllib/env/multi_agent_episode.py#L848) check if this has happened then added an extra step at the beginning to include this hanging data. However, in testing, the multi-agent episode `cut` method already implements this (if using `slice` this will cause a hidden bug) meaning that an extra unnecessary step's data is being added resulting in the episode beginnings not lining up. Therefore, this PR removes this code and replaces with a simple check to assume that the hanging action is equivalent to the initial action in the next episode. For testing, I found that the `concat_episode` test was using `slice` which doesn't account for hanging data while `cut` which is used in the env-runner does. I modified the test to be more functional based where I created a custom environment that has agents taking actions at different frequencies then returning as an observation the agent's timestep. This allows us to test through concatenating all episodes of the same ID and checking that the observations increase 0, 1, 2, ... and ensures that no data goes missing for users. --------- Signed-off-by: Mark Towers <[email protected]> Co-authored-by: Mark Towers <[email protected]> Signed-off-by: jasonwrwang <[email protected]>

Fix Multi-Agent Episode concatenation for sequential environments

ee42984

Signed-off-by: Mark Towers <[email protected]>

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

rllib/env/multi_agent_episode.py Outdated Show resolved Hide resolved

rllib/env/multi_agent_episode.py Outdated Show resolved Hide resolved

Fix the tests

a4396e7

Signed-off-by: Mark Towers <[email protected]>

pseudo-rnd-thoughts marked this pull request as ready for review January 6, 2026 18:20

pseudo-rnd-thoughts requested a review from a team as a code owner January 6, 2026 18:20

ray-gardener bot added the rllib RLlib related issues label Jan 6, 2026

pseudo-rnd-thoughts added the rllib-envrunners Issues around the sampling backend of RLlib label Jan 7, 2026

simonsays1980 requested changes Jan 8, 2026

View reviewed changes

simonsays1980 added the go add ONLY when ready to merge, run all tests label Jan 8, 2026

Code review

4ae6d27

Signed-off-by: Mark Towers <[email protected]>

cursor bot reviewed Jan 8, 2026

View reviewed changes

rllib/env/multi_agent_episode.py Show resolved Hide resolved

simonsays1980 approved these changes Jan 8, 2026

View reviewed changes

rllib/env/single_agent_episode.py Show resolved Hide resolved

simonsays1980 enabled auto-merge (squash) January 8, 2026 14:10

simonsays1980 merged commit 9700991 into ray-project:master Jan 8, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Multi-Agent Episode concatenation for sequential environments #59895

Fix Multi-Agent Episode concatenation for sequential environments #59895

pseudo-rnd-thoughts commented Jan 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

simonsays1980 left a comment

Uh oh!

Uh oh!

Uh oh!

simonsays1980 Jan 8, 2026

Uh oh!

pseudo-rnd-thoughts Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

simonsays1980 left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix Multi-Agent Episode concatenation for sequential environments #59895

Fix Multi-Agent Episode concatenation for sequential environments #59895

Conversation

pseudo-rnd-thoughts commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

simonsays1980 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

simonsays1980 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

pseudo-rnd-thoughts Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

simonsays1980 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pseudo-rnd-thoughts commented Jan 6, 2026 •

edited

Loading