Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
75bdc1d
[rllib] Merge tuned-examples into examples
Nov 21, 2025
b89a7af
Update BUILD.bazel for tuned-examples new location
Nov 21, 2025
5100f82
Gemini review
Nov 21, 2025
36d242f
Merge branch 'master' into merge-tuned-examples
Nov 21, 2025
a0dd0c3
update offline data path
Nov 24, 2025
5a4d28b
update tuned_example file paths
Nov 24, 2025
0f2d5bd
Fix file paths
Nov 24, 2025
b75ba43
Merge branch 'master' into merge-tuned-examples
Nov 26, 2025
dc3be81
Update rllib release test directory and release test paths
Nov 26, 2025
38bf29b
[rllib] Update APPO premerge
Nov 26, 2025
a8f5bcd
Merge branch 'master' into appo-premerge
Nov 27, 2025
6c0dd98
Clean appo folder
Nov 27, 2025
5e9d21d
Add stateles cartpole
Nov 27, 2025
101a5ee
Merge branch 'master' into appo-premerge
Dec 1, 2025
b946c72
pre-commit
Dec 1, 2025
2234d96
Improve documentation
Dec 2, 2025
206d8fc
Fix training scripts
Dec 3, 2025
8bfede0
Merge branch 'master' into appo-premerge
Dec 10, 2025
d9a559e
Change to TicTacToe from Connect4
Dec 10, 2025
32f0e68
Updated to BUILD.bazel to tictactoe file
Dec 10, 2025
a6270c3
kamil code-review
Dec 11, 2025
f80bfa7
Merge branch 'master' into appo-premerge
pseudo-rnd-thoughts Dec 11, 2025
617ce9f
code-review
Dec 12, 2025
2892342
update tictactoe and stop rewards
Dec 15, 2025
05db432
Merge branch 'master' into appo-premerge
kamil-kaczmarek Dec 16, 2025
f08cfab
Add nightly tests
Dec 16, 2025
350c68e
Add default_iters to atari
Dec 16, 2025
9f1bd4d
Update sac examples
Dec 16, 2025
15b8376
Update tic tac toe implementation
Dec 16, 2025
2b141ac
Rewrite TicTacToe and add stop rewards / iters for premerge
Dec 17, 2025
15d25ef
Merge branch 'master' into appo-premerge
Dec 17, 2025
fd8a835
Fix release tests cluster_compute
Dec 17, 2025
c068518
Added docstrings
Dec 18, 2025
7b122d4
code-review
Dec 18, 2025
828e383
remove type: gpu from non gpu nightly
Dec 18, 2025
43b68f9
code review
Dec 19, 2025
dd8a8ce
Merge branch 'appo-premerge' into sac-premerge-nightly
Dec 19, 2025
b98e625
Update the parameters
Dec 19, 2025
7bb656d
Add note about GPU learners
Dec 19, 2025
d989c5e
Fix run name
Dec 19, 2025
319d6b3
pre-commit + more docstring details
Dec 19, 2025
241fd39
Merge branch 'master' into appo-premerge
Dec 29, 2025
30a7b6a
Reduce the number of env-runners from 5 to 4
Dec 29, 2025
36d7c3f
Update documentation
Dec 31, 2025
866da95
Merge branch 'master' into sac-premerge-nightly
Dec 31, 2025
797a9a8
Merge branch 'appo-premerge' into sac-premerge-nightly
Dec 31, 2025
4f0739a
Update documentation, remove appo changes
Dec 31, 2025
791dbca
Fix the tictactoe implementation
Jan 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions doc/source/rllib/rllib-algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,8 +154,10 @@ Soft Actor Critic (SAC)


**Tuned examples:**
`Pendulum-v1 <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/sac/pendulum-sac.yaml>`__,
`HalfCheetah-v3 <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/sac/halfcheetah_sac.py>`__,
`Cartpole-v1 <https://github.com/ray-project/ray/blob/master/rllib/examples/algorithms/sac/cartpole_sac.py>`__,
`Atari (Pong-v5) with Rainbow <https://github.com/ray-project/ray/blob/master/rllib/examples/algorithms/sac/atari_sac.py>`__,
`with LSTM <https://github.com/ray-project/ray/blob/master/rllib/examples/algorithms/sac/stateless_cartpole_sac_with_lstm.py>`__,
`Multi-Agent <https://github.com/ray-project/ray/blob/master/rllib/examples/algorithms/sac/tictactoe_sac.py>`__,

**SAC-specific configs** (see also :ref:`generic algorithm settings <rllib-algo-configuration-generic-settings>`):

Expand Down Expand Up @@ -195,8 +197,10 @@ Asynchronous Proximal Policy Optimization (APPO)


**Tuned examples:**
`Pong-v5 <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/appo/pong_appo.py>`__
`HalfCheetah-v4 <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/appo/halfcheetah_appo.py>`__
`Atari (Pong-v5) <https://github.com/ray-project/ray/blob/master/rllib/examples/algorithms/appo/atari_appo.py>`__
`MuJoCo (Humanoid-v4) <https://github.com/ray-project/ray/blob/master/rllib/examples/algorithms/appo/mujoco_appo.py>`__
`Using an LSTM <https://github.com/ray-project/ray/blob/master/rllib/examples/algorithms/appo/stateless_cartpole_appo_with_lstm.py>`__
`Multi-Agent <https://github.com/ray-project/ray/blob/master/rllib/examples/algorithms/appo/tictactoe_appo.py>`__

**APPO-specific configs** (see also :ref:`generic algorithm settings <rllib-algo-configuration-generic-settings>`):

Expand Down
236 changes: 53 additions & 183 deletions rllib/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -87,12 +87,29 @@ py_library(
)

# --------------------------------------------------------------------
# Algorithms learning regression tests.
# Algorithms learning regression tests (rllib/examples/algorithm/[algo-name]).
#
# Tag: learning_tests
#
# This will test python/yaml config files
# inside rllib/examples/algorithms/[algo-name] for actual learning success.
# These tests check that the algorithm achieves above random performance within a relatively short period of time,
# not that the algorithm reaches the optimal policy.
#
# For single to multi-learner tests, the expected output should change,
# either reducing the maximum iterations or samples, or increasing the max return
# to ensure that the multi-learner is achieving something that the single shouldn’t be able to normally achieve.
#
# Compute Config
# - local (CPU) = 7 CPUs, 0 GPU: 5 Env Runners, 0 Learners on CPU, 2 Aggregator Actors per Learner on CPU
# - single (CPU) = 8 CPUs, 0 GPU: 5 Env Runner, 1 Learners on CPU, 2 Aggregator Actors per Learner on CPU
# - single (GPU) = 8 CPUs, 1 GPU: 5 Env Runner, 1 Learners on GPU, 2 Aggregator Actors per Learner on CPU
# - multi (GPU) = 16 CPUs, 2 GPUs: 10 Env Runners, 2 Learners on GPU, 2 Aggregator Actors per Learner on CPU (4 total CPUs)
#
# Legend
# - SA = Single Agent Environment
# - MA = Multi Agent Environment
# - D = Discrete actions
# - C = Continuous actions
# - LSTM = recurrent policy through lstms
# --------------------------------------------------------------------

# APPO
Expand Down Expand Up @@ -1622,238 +1639,91 @@ py_test(
],
)

# SAC
# MountainCar
py_test(
name = "learning_tests_mountaincar_sac",
size = "large",
srcs = ["examples/algorithms/sac/mountaincar_sac.py"],
args = [
"--as-test",
],
main = "examples/algorithms/sac/mountaincar_sac.py",
tags = [
"exclusive",
"learning_tests",
"learning_tests_discrete",
"team:rllib",
"torch_only",
],
)

py_test(
name = "learning_tests_mountaincar_sac_gpu",
size = "large",
srcs = ["examples/algorithms/sac/mountaincar_sac.py"],
args = [
"--as-test",
"--num-learners=1",
"--num-gpus-per-learner=1",
],
main = "examples/algorithms/sac/mountaincar_sac.py",
tags = [
"exclusive",
"gpu",
"learning_tests",
"learning_tests_discrete",
"team:rllib",
"torch_only",
],
)

py_test(
name = "learning_tests_mountaincar_sac_multi_cpu",
size = "large",
srcs = ["examples/algorithms/sac/mountaincar_sac.py"],
args = [
"--as-test",
"--num-learners=2",
],
main = "examples/algorithms/sac/mountaincar_sac.py",
tags = [
"exclusive",
"learning_tests",
"learning_tests_discrete",
"team:rllib",
"torch_only",
],
)

py_test(
name = "learning_tests_mountaincar_sac_multi_gpu",
size = "large",
timeout = "eternal",
srcs = ["examples/algorithms/sac/mountaincar_sac.py"],
args = [
"--as-test",
"--num-learners=2",
"--num-gpus-per-learner=1",
],
main = "examples/algorithms/sac/mountaincar_sac.py",
tags = [
"exclusive",
"learning_tests",
"learning_tests_discrete",
"multi_gpu",
"team:rllib",
"torch_only",
],
)
# | SAC (14 total tests) | | Number of Learners (Device) |
# | Environment | Success | Local (CPU) | Single (CPU) | Single (GPU) | Multi (GPU) |
# |--------------------------------|---------|-------------|-----------------|--------------|-------------|
# | (SA/D/LSTM) Stateless Cartpole | 150 | ✅ | ❌ | ❌ | ❌ |
# | (MA/D) TicTacToe | -2.0 | ❌ | ✅ | ❌ | ❌ |
# | (SA/D) Atari (Pong) | 5 | ❌ | ❌ | ❌ | ✅ |
# | (SA/C) MuJoCo (Humanoid) | 200 | ❌ | ❌ | ✅ | ❌ |

# Pendulum
py_test(
name = "learning_tests_pendulum_sac",
name = "learning_tests_sac_stateless_cartpole_local",
size = "large",
srcs = ["examples/algorithms/sac/pendulum_sac.py"],
srcs = ["examples/algorithms/sac/stateless_cartpole_sac_with_lstm.py"],
args = [
"--as-test",
"--num-cpus=7",
"--num-env-runners=5",
"--num-learners=0",
"--stop-reward=150",
],
main = "examples/algorithms/sac/pendulum_sac.py",
main = "examples/algorithms/sac/stateless_cartpole_sac_with_lstm.py",
tags = [
"exclusive",
"learning_tests",
"learning_tests_continuous",
"team:rllib",
"torch_only",
],
)

py_test(
name = "learning_tests_pendulum_sac_gpu",
name = "learning_tests_sac_tictactoe_single_cpu",
size = "large",
srcs = ["examples/algorithms/sac/pendulum_sac.py"],
srcs = ["examples/algorithms/sac/tictactoe_sac.py"],
args = [
"--as-test",
"--num-cpus=8",
"--num-env-runners=5",
"--num-learners=1",
"--num-gpus-per-learner=1",
"--stop-reward=-2",
],
main = "examples/algorithms/sac/pendulum_sac.py",
main = "examples/algorithms/sac/tictactoe_sac.py",
tags = [
"exclusive",
"gpu",
"learning_tests",
"learning_tests_continuous",
"team:rllib",
"torch_only",
],
)

py_test(
name = "learning_tests_pendulum_sac_multi_cpu",
size = "large",
srcs = ["examples/algorithms/sac/pendulum_sac.py"],
args = [
"--as-test",
"--num-learners=2",
],
main = "examples/algorithms/sac/pendulum_sac.py",
tags = [
"exclusive",
"learning_tests",
"learning_tests_continuous",
"team:rllib",
"torch_only",
],
)

py_test(
name = "learning_tests_pendulum_sac_multi_gpu",
name = "learning_tests_sac_atari_multi_gpu",
size = "large",
srcs = ["examples/algorithms/sac/pendulum_sac.py"],
srcs = ["examples/algorithms/sac/atari_sac.py"],
args = [
"--as-test",
"--num-cpus=16",
"--num-env-runners=10",
"--num-learners=2",
"--num-gpus-per-learner=1",
"--stop-reward=5",
],
main = "examples/algorithms/sac/pendulum_sac.py",
tags = [
"exclusive",
"learning_tests",
"learning_tests_continuous",
"multi_gpu",
"team:rllib",
"torch_only",
],
)

# MultiAgentPendulum
py_test(
name = "learning_tests_multi_agent_pendulum_sac",
size = "large",
srcs = ["examples/algorithms/sac/multi_agent_pendulum_sac.py"],
args = [
"--as-test",
"--num-agents=2",
"--num-cpus=4",
],
main = "examples/algorithms/sac/multi_agent_pendulum_sac.py",
main = "examples/algorithms/sac/atari_sac.py",
tags = [
"exclusive",
"gpu",
"learning_tests",
"learning_tests_continuous",
"team:rllib",
"torch_only",
],
)
Comment on lines +1689 to 1707
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test is configured for multi-GPU execution (num_learners=2, num_gpus_per_learner=1), but it's tagged with gpu. For consistency with other multi-GPU tests in this file (e.g., learning_test_appo_tictactoe_multi_gpu), this should be multi_gpu.

    tags = [
        "exclusive",
        "multi_gpu",
        "learning_tests",
        "team:rllib",
    ],


py_test(
name = "learning_tests_multi_agent_pendulum_sac_gpu",
name = "learning_tests_sac_mujoco_single_gpu",
size = "large",
srcs = ["examples/algorithms/sac/multi_agent_pendulum_sac.py"],
srcs = ["examples/algorithms/sac/mujoco_sac.py"],
args = [
"--as-test",
"--num-agents=2",
"--num-cpus=4",
"--num-cpus=8",
"--num-env-runners=5",
"--num-learners=1",
"--num-gpus-per-learner=1",
"--stop-reward=200",
],
main = "examples/algorithms/sac/multi_agent_pendulum_sac.py",
tags = [
"exclusive",
"gpu",
"learning_tests",
"learning_tests_continuous",
"team:rllib",
"torch_only",
],
)

py_test(
name = "learning_tests_multi_agent_pendulum_sac_multi_cpu",
size = "large",
srcs = ["examples/algorithms/sac/multi_agent_pendulum_sac.py"],
args = [
"--num-agents=2",
"--num-learners=2",
],
main = "examples/algorithms/sac/multi_agent_pendulum_sac.py",
tags = [
"exclusive",
"learning_tests",
"learning_tests_continuous",
"team:rllib",
"torch_only",
],
)

py_test(
name = "learning_tests_multi_agent_pendulum_sac_multi_gpu",
size = "large",
timeout = "eternal",
srcs = ["examples/algorithms/sac/multi_agent_pendulum_sac.py"],
args = [
"--num-agents=2",
"--num-learners=2",
"--num-gpus-per-learner=1",
],
main = "examples/algorithms/sac/multi_agent_pendulum_sac.py",
main = "examples/algorithms/sac/mountaincar_sac.py",
tags = [
"exclusive",
"learning_tests",
"learning_tests_continuous",
"multi_gpu",
"team:rllib",
"torch_only",
],
)
Comment on lines 1709 to 1728
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There are a couple of issues with this py_test definition:

  1. The main attribute on line 1661 points to mountaincar_sac.py, but it should be mujoco_sac.py to match the srcs.
  2. This is a single-GPU test, but it's tagged with multi_gpu on line 1665. It should be gpu for consistency.


Expand Down
2 changes: 1 addition & 1 deletion rllib/core/rl_module/default_model_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ class DefaultModelConfig:
#: Activation function descriptor for the stack configured by `head_fcnet_hiddens`.
#: Supported values are: 'tanh', 'relu', 'swish' (or 'silu', which is the same),
#: and 'linear' (or None).
head_fcnet_activation: str = "relu"
head_fcnet_activation: str | None = "relu"
#: Initializer function or class descriptor for the weight/kernel matrices in the
#: stack configured by `head_fcnet_hiddens`. Supported values are the initializer
#: names (str), classes or functions listed by the frameworks (`torch`). See
Expand Down
Loading