Skip to content

Conversation

@pseudo-rnd-thoughts
Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts commented Jan 6, 2026

Description

For #59508 and #59581 that using TicTacToe would cause the following error

ray::DQN.train() (pid=88183, ip=127.0.0.1, actor_id=2b775f13e808cc4aaaa23bde01000000, repr=DQN(env=<class 'ray.rllib.examples.envs.classes.multi_agent.tic_tac_toe.TicTacToe'>; env-runners=0; learners=0; multi-agent=True))
  File "ray/python/ray/tune/trainable/trainable.py", line 331, in train
    raise skipped from exception_cause(skipped)
  File "ray/python/ray/tune/trainable/trainable.py", line 328, in train
    result = self.step()
  File "ray/python/ray/rllib/algorithms/algorithm.py", line 1242, in step
    train_results, train_iter_ctx = self._run_one_training_iteration()
  File "ray/python/ray/rllib/algorithms/algorithm.py", line 3666, in _run_one_training_iteration
    training_step_return_value = self.training_step()
  File "ray/python/ray/rllib/algorithms/dqn/dqn.py", line 646, in training_step
    return self._training_step_new_api_stack()
  File "ray/python/ray/rllib/algorithms/dqn/dqn.py", line 668, in _training_step_new_api_stack
    self.local_replay_buffer.add(episodes)
  File "ray/python/ray/rllib/utils/replay_buffers/prioritized_episode_buffer.py", line 314, in add
    existing_eps.concat_episode(eps)
  File "ray/python/ray/rllib/env/multi_agent_episode.py", line 862, in concat_episode
    sa_episode.concat_episode(other.agent_episodes[agent_id])
  File "ray/python/ray/rllib/env/single_agent_episode.py", line 618, in concat_episode
    assert self.t == other.t_started
AssertionError

In the multi-agent-episode concat_episode, we check if any agent hasn't received their next observation from an observation, action. This results in a hanging action where in one episode is the observation, action then in the next is the resulting observation, reward, etc. This code check if this has happened then added an extra step at the beginning to include this hanging data.

However, in testing, the multi-agent episode cut method already implements this (if using slice this will cause a hidden bug) meaning that an extra unnecessary step's data is being added resulting in the episode beginnings not lining up.
Therefore, this PR removes this code and replaces with a simple check to assume that the hanging action is equivalent to the initial action in the next episode.

For testing, I found that the concat_episode test was using slice which doesn't account for hanging data while cut which is used in the env-runner does. I modified the test to be more functional based where I created a custom environment that has agents taking actions at different frequencies then returning as an observation the agent's timestep. This allows us to test through concatenating all episodes of the same ID and checking that the observations increase 0, 1, 2, ... and ensures that no data goes missing for users.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an AssertionError during multi-agent episode concatenation by removing an unnecessary add_env_step call. The new approach of verifying the consistency of the hanging action with the next episode's initial action is sound.

I've identified a couple of issues:

  1. A debug print statement has been left in the code.
  2. The action comparison logic is not robust for numpy arrays or nested structures, which could lead to runtime errors.

I've provided suggestions to fix these. Also, as noted in the PR description, adding tests to cover this fix would be crucial to prevent future regressions.

Signed-off-by: Mark Towers <[email protected]>
@pseudo-rnd-thoughts pseudo-rnd-thoughts marked this pull request as ready for review January 6, 2026 18:20
@pseudo-rnd-thoughts pseudo-rnd-thoughts requested a review from a team as a code owner January 6, 2026 18:20
@ray-gardener ray-gardener bot added the rllib RLlib related issues label Jan 6, 2026
@pseudo-rnd-thoughts pseudo-rnd-thoughts added the rllib-envrunners Issues around the sampling backend of RLlib label Jan 7, 2026
Copy link
Contributor

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @pseudo-rnd-thoughts ! Small things: could you raise an issue for the slice method to be fixed in the future? And it needs a small change in the docstring - did not came from your change though

In order for this to work, both chunks (`self` and `other`) must fit
together. This is checked by the IDs (must be identical), the time step counters
together that are split through `cut`. For sequential multi-agent environments
using slice might cause problems from hanging observation/actions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we need to fix in the near future. Could you raise another issue on Ray OSS please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is a bug or an inherent limitation of the slice method

@simonsays1980 simonsays1980 added the go add ONLY when ready to merge, run all tests label Jan 8, 2026
Signed-off-by: Mark Towers <[email protected]>
Copy link
Contributor

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @pseudo-rnd-thoughts !

@simonsays1980 simonsays1980 enabled auto-merge (squash) January 8, 2026 14:10
@simonsays1980 simonsays1980 merged commit 9700991 into ray-project:master Jan 8, 2026
7 checks passed
AYou0207 pushed a commit to AYou0207/ray that referenced this pull request Jan 13, 2026
…y-project#59895)

## Description
For ray-project#59508 and
ray-project#59581 that using TicTacToe would
cause the following error

```
ray::DQN.train() (pid=88183, ip=127.0.0.1, actor_id=2b775f13e808cc4aaaa23bde01000000, repr=DQN(env=<class 'ray.rllib.examples.envs.classes.multi_agent.tic_tac_toe.TicTacToe'>; env-runners=0; learners=0; multi-agent=True))
  File "ray/python/ray/tune/trainable/trainable.py", line 331, in train
    raise skipped from exception_cause(skipped)
  File "ray/python/ray/tune/trainable/trainable.py", line 328, in train
    result = self.step()
  File "ray/python/ray/rllib/algorithms/algorithm.py", line 1242, in step
    train_results, train_iter_ctx = self._run_one_training_iteration()
  File "ray/python/ray/rllib/algorithms/algorithm.py", line 3666, in _run_one_training_iteration
    training_step_return_value = self.training_step()
  File "ray/python/ray/rllib/algorithms/dqn/dqn.py", line 646, in training_step
    return self._training_step_new_api_stack()
  File "ray/python/ray/rllib/algorithms/dqn/dqn.py", line 668, in _training_step_new_api_stack
    self.local_replay_buffer.add(episodes)
  File "ray/python/ray/rllib/utils/replay_buffers/prioritized_episode_buffer.py", line 314, in add
    existing_eps.concat_episode(eps)
  File "ray/python/ray/rllib/env/multi_agent_episode.py", line 862, in concat_episode
    sa_episode.concat_episode(other.agent_episodes[agent_id])
  File "ray/python/ray/rllib/env/single_agent_episode.py", line 618, in concat_episode
    assert self.t == other.t_started
AssertionError
```

In the multi-agent-episode `concat_episode`, we check if any agent
hasn't received their next observation from an observation, action. This
results in a hanging action where in one episode is the observation,
action then in the next is the resulting observation, reward, etc. This
[code](https://github.com/ray-project/ray/blob/22cf6ef6af2cddc233bca7ce59668ed8f4bbb17e/rllib/env/multi_agent_episode.py#L848)
check if this has happened then added an extra step at the beginning to
include this hanging data.

However, in testing, the multi-agent episode `cut` method already
implements this (if using `slice` this will cause a hidden bug) meaning
that an extra unnecessary step's data is being added resulting in the
episode beginnings not lining up.
Therefore, this PR removes this code and replaces with a simple check to
assume that the hanging action is equivalent to the initial action in
the next episode.

For testing, I found that the `concat_episode` test was using `slice`
which doesn't account for hanging data while `cut` which is used in the
env-runner does. I modified the test to be more functional based where I
created a custom environment that has agents taking actions at different
frequencies then returning as an observation the agent's timestep. This
allows us to test through concatenating all episodes of the same ID and
checking that the observations increase 0, 1, 2, ... and ensures that no
data goes missing for users.

---------

Signed-off-by: Mark Towers <[email protected]>
Co-authored-by: Mark Towers <[email protected]>
Signed-off-by: jasonwrwang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-envrunners Issues around the sampling backend of RLlib

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants