-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[rllib] Add SAC testing to premerge #59581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
pseudo-rnd-thoughts
wants to merge
48
commits into
ray-project:master
Choose a base branch
from
pseudo-rnd-thoughts:sac-premerge-nightly
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
48 commits
Select commit
Hold shift + click to select a range
75bdc1d
[rllib] Merge tuned-examples into examples
b89a7af
Update BUILD.bazel for tuned-examples new location
5100f82
Gemini review
36d242f
Merge branch 'master' into merge-tuned-examples
a0dd0c3
update offline data path
5a4d28b
update tuned_example file paths
0f2d5bd
Fix file paths
b75ba43
Merge branch 'master' into merge-tuned-examples
dc3be81
Update rllib release test directory and release test paths
38bf29b
[rllib] Update APPO premerge
a8f5bcd
Merge branch 'master' into appo-premerge
6c0dd98
Clean appo folder
5e9d21d
Add stateles cartpole
101a5ee
Merge branch 'master' into appo-premerge
b946c72
pre-commit
2234d96
Improve documentation
206d8fc
Fix training scripts
8bfede0
Merge branch 'master' into appo-premerge
d9a559e
Change to TicTacToe from Connect4
32f0e68
Updated to BUILD.bazel to tictactoe file
a6270c3
kamil code-review
f80bfa7
Merge branch 'master' into appo-premerge
pseudo-rnd-thoughts 617ce9f
code-review
2892342
update tictactoe and stop rewards
05db432
Merge branch 'master' into appo-premerge
kamil-kaczmarek f08cfab
Add nightly tests
350c68e
Add default_iters to atari
9f1bd4d
Update sac examples
15b8376
Update tic tac toe implementation
2b141ac
Rewrite TicTacToe and add stop rewards / iters for premerge
15d25ef
Merge branch 'master' into appo-premerge
fd8a835
Fix release tests cluster_compute
c068518
Added docstrings
7b122d4
code-review
828e383
remove type: gpu from non gpu nightly
43b68f9
code review
dd8a8ce
Merge branch 'appo-premerge' into sac-premerge-nightly
b98e625
Update the parameters
7bb656d
Add note about GPU learners
d989c5e
Fix run name
319d6b3
pre-commit + more docstring details
241fd39
Merge branch 'master' into appo-premerge
30a7b6a
Reduce the number of env-runners from 5 to 4
36d7c3f
Update documentation
866da95
Merge branch 'master' into sac-premerge-nightly
797a9a8
Merge branch 'appo-premerge' into sac-premerge-nightly
4f0739a
Update documentation, remove appo changes
791dbca
Fix the tictactoe implementation
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -87,12 +87,29 @@ py_library( | |
| ) | ||
|
|
||
| # -------------------------------------------------------------------- | ||
| # Algorithms learning regression tests. | ||
| # Algorithms learning regression tests (rllib/examples/algorithm/[algo-name]). | ||
| # | ||
| # Tag: learning_tests | ||
| # | ||
| # This will test python/yaml config files | ||
| # inside rllib/examples/algorithms/[algo-name] for actual learning success. | ||
| # These tests check that the algorithm achieves above random performance within a relatively short period of time, | ||
| # not that the algorithm reaches the optimal policy. | ||
| # | ||
| # For single to multi-learner tests, the expected output should change, | ||
| # either reducing the maximum iterations or samples, or increasing the max return | ||
| # to ensure that the multi-learner is achieving something that the single shouldn’t be able to normally achieve. | ||
| # | ||
| # Compute Config | ||
| # - local (CPU) = 7 CPUs, 0 GPU: 5 Env Runners, 0 Learners on CPU, 2 Aggregator Actors per Learner on CPU | ||
| # - single (CPU) = 8 CPUs, 0 GPU: 5 Env Runner, 1 Learners on CPU, 2 Aggregator Actors per Learner on CPU | ||
| # - single (GPU) = 8 CPUs, 1 GPU: 5 Env Runner, 1 Learners on GPU, 2 Aggregator Actors per Learner on CPU | ||
| # - multi (GPU) = 16 CPUs, 2 GPUs: 10 Env Runners, 2 Learners on GPU, 2 Aggregator Actors per Learner on CPU (4 total CPUs) | ||
| # | ||
| # Legend | ||
| # - SA = Single Agent Environment | ||
| # - MA = Multi Agent Environment | ||
| # - D = Discrete actions | ||
| # - C = Continuous actions | ||
| # - LSTM = recurrent policy through lstms | ||
| # -------------------------------------------------------------------- | ||
|
|
||
| # APPO | ||
|
|
@@ -1622,238 +1639,91 @@ py_test( | |
| ], | ||
| ) | ||
|
|
||
| # SAC | ||
| # MountainCar | ||
| py_test( | ||
| name = "learning_tests_mountaincar_sac", | ||
| size = "large", | ||
| srcs = ["examples/algorithms/sac/mountaincar_sac.py"], | ||
| args = [ | ||
| "--as-test", | ||
| ], | ||
| main = "examples/algorithms/sac/mountaincar_sac.py", | ||
| tags = [ | ||
| "exclusive", | ||
| "learning_tests", | ||
| "learning_tests_discrete", | ||
| "team:rllib", | ||
| "torch_only", | ||
| ], | ||
| ) | ||
|
|
||
| py_test( | ||
| name = "learning_tests_mountaincar_sac_gpu", | ||
| size = "large", | ||
| srcs = ["examples/algorithms/sac/mountaincar_sac.py"], | ||
| args = [ | ||
| "--as-test", | ||
| "--num-learners=1", | ||
| "--num-gpus-per-learner=1", | ||
| ], | ||
| main = "examples/algorithms/sac/mountaincar_sac.py", | ||
| tags = [ | ||
| "exclusive", | ||
| "gpu", | ||
| "learning_tests", | ||
| "learning_tests_discrete", | ||
| "team:rllib", | ||
| "torch_only", | ||
| ], | ||
| ) | ||
|
|
||
| py_test( | ||
| name = "learning_tests_mountaincar_sac_multi_cpu", | ||
| size = "large", | ||
| srcs = ["examples/algorithms/sac/mountaincar_sac.py"], | ||
| args = [ | ||
| "--as-test", | ||
| "--num-learners=2", | ||
| ], | ||
| main = "examples/algorithms/sac/mountaincar_sac.py", | ||
| tags = [ | ||
| "exclusive", | ||
| "learning_tests", | ||
| "learning_tests_discrete", | ||
| "team:rllib", | ||
| "torch_only", | ||
| ], | ||
| ) | ||
|
|
||
| py_test( | ||
| name = "learning_tests_mountaincar_sac_multi_gpu", | ||
| size = "large", | ||
| timeout = "eternal", | ||
| srcs = ["examples/algorithms/sac/mountaincar_sac.py"], | ||
| args = [ | ||
| "--as-test", | ||
| "--num-learners=2", | ||
| "--num-gpus-per-learner=1", | ||
| ], | ||
| main = "examples/algorithms/sac/mountaincar_sac.py", | ||
| tags = [ | ||
| "exclusive", | ||
| "learning_tests", | ||
| "learning_tests_discrete", | ||
| "multi_gpu", | ||
| "team:rllib", | ||
| "torch_only", | ||
| ], | ||
| ) | ||
| # | SAC (14 total tests) | | Number of Learners (Device) | | ||
| # | Environment | Success | Local (CPU) | Single (CPU) | Single (GPU) | Multi (GPU) | | ||
| # |--------------------------------|---------|-------------|-----------------|--------------|-------------| | ||
| # | (SA/D/LSTM) Stateless Cartpole | 150 | ✅ | ❌ | ❌ | ❌ | | ||
| # | (MA/D) TicTacToe | -2.0 | ❌ | ✅ | ❌ | ❌ | | ||
| # | (SA/D) Atari (Pong) | 5 | ❌ | ❌ | ❌ | ✅ | | ||
| # | (SA/C) MuJoCo (Humanoid) | 200 | ❌ | ❌ | ✅ | ❌ | | ||
|
|
||
| # Pendulum | ||
| py_test( | ||
| name = "learning_tests_pendulum_sac", | ||
| name = "learning_tests_sac_stateless_cartpole_local", | ||
| size = "large", | ||
| srcs = ["examples/algorithms/sac/pendulum_sac.py"], | ||
| srcs = ["examples/algorithms/sac/stateless_cartpole_sac_with_lstm.py"], | ||
| args = [ | ||
| "--as-test", | ||
| "--num-cpus=7", | ||
| "--num-env-runners=5", | ||
| "--num-learners=0", | ||
| "--stop-reward=150", | ||
| ], | ||
| main = "examples/algorithms/sac/pendulum_sac.py", | ||
| main = "examples/algorithms/sac/stateless_cartpole_sac_with_lstm.py", | ||
| tags = [ | ||
| "exclusive", | ||
| "learning_tests", | ||
| "learning_tests_continuous", | ||
| "team:rllib", | ||
| "torch_only", | ||
| ], | ||
| ) | ||
|
|
||
| py_test( | ||
| name = "learning_tests_pendulum_sac_gpu", | ||
| name = "learning_tests_sac_tictactoe_single_cpu", | ||
| size = "large", | ||
| srcs = ["examples/algorithms/sac/pendulum_sac.py"], | ||
| srcs = ["examples/algorithms/sac/tictactoe_sac.py"], | ||
| args = [ | ||
| "--as-test", | ||
| "--num-cpus=8", | ||
| "--num-env-runners=5", | ||
| "--num-learners=1", | ||
| "--num-gpus-per-learner=1", | ||
| "--stop-reward=-2", | ||
| ], | ||
| main = "examples/algorithms/sac/pendulum_sac.py", | ||
| main = "examples/algorithms/sac/tictactoe_sac.py", | ||
| tags = [ | ||
| "exclusive", | ||
| "gpu", | ||
| "learning_tests", | ||
| "learning_tests_continuous", | ||
| "team:rllib", | ||
| "torch_only", | ||
| ], | ||
| ) | ||
|
|
||
| py_test( | ||
| name = "learning_tests_pendulum_sac_multi_cpu", | ||
| size = "large", | ||
| srcs = ["examples/algorithms/sac/pendulum_sac.py"], | ||
| args = [ | ||
| "--as-test", | ||
| "--num-learners=2", | ||
| ], | ||
| main = "examples/algorithms/sac/pendulum_sac.py", | ||
| tags = [ | ||
| "exclusive", | ||
| "learning_tests", | ||
| "learning_tests_continuous", | ||
| "team:rllib", | ||
| "torch_only", | ||
| ], | ||
| ) | ||
|
|
||
| py_test( | ||
| name = "learning_tests_pendulum_sac_multi_gpu", | ||
| name = "learning_tests_sac_atari_multi_gpu", | ||
| size = "large", | ||
| srcs = ["examples/algorithms/sac/pendulum_sac.py"], | ||
| srcs = ["examples/algorithms/sac/atari_sac.py"], | ||
| args = [ | ||
| "--as-test", | ||
| "--num-cpus=16", | ||
| "--num-env-runners=10", | ||
| "--num-learners=2", | ||
| "--num-gpus-per-learner=1", | ||
| "--stop-reward=5", | ||
| ], | ||
| main = "examples/algorithms/sac/pendulum_sac.py", | ||
| tags = [ | ||
| "exclusive", | ||
| "learning_tests", | ||
| "learning_tests_continuous", | ||
| "multi_gpu", | ||
| "team:rllib", | ||
| "torch_only", | ||
| ], | ||
| ) | ||
|
|
||
| # MultiAgentPendulum | ||
| py_test( | ||
| name = "learning_tests_multi_agent_pendulum_sac", | ||
| size = "large", | ||
| srcs = ["examples/algorithms/sac/multi_agent_pendulum_sac.py"], | ||
| args = [ | ||
| "--as-test", | ||
| "--num-agents=2", | ||
| "--num-cpus=4", | ||
| ], | ||
| main = "examples/algorithms/sac/multi_agent_pendulum_sac.py", | ||
| main = "examples/algorithms/sac/atari_sac.py", | ||
| tags = [ | ||
| "exclusive", | ||
| "gpu", | ||
| "learning_tests", | ||
| "learning_tests_continuous", | ||
| "team:rllib", | ||
| "torch_only", | ||
| ], | ||
| ) | ||
|
|
||
| py_test( | ||
| name = "learning_tests_multi_agent_pendulum_sac_gpu", | ||
| name = "learning_tests_sac_mujoco_single_gpu", | ||
| size = "large", | ||
| srcs = ["examples/algorithms/sac/multi_agent_pendulum_sac.py"], | ||
| srcs = ["examples/algorithms/sac/mujoco_sac.py"], | ||
| args = [ | ||
| "--as-test", | ||
| "--num-agents=2", | ||
| "--num-cpus=4", | ||
| "--num-cpus=8", | ||
| "--num-env-runners=5", | ||
| "--num-learners=1", | ||
| "--num-gpus-per-learner=1", | ||
| "--stop-reward=200", | ||
| ], | ||
| main = "examples/algorithms/sac/multi_agent_pendulum_sac.py", | ||
| tags = [ | ||
| "exclusive", | ||
| "gpu", | ||
| "learning_tests", | ||
| "learning_tests_continuous", | ||
| "team:rllib", | ||
| "torch_only", | ||
| ], | ||
| ) | ||
|
|
||
| py_test( | ||
| name = "learning_tests_multi_agent_pendulum_sac_multi_cpu", | ||
| size = "large", | ||
| srcs = ["examples/algorithms/sac/multi_agent_pendulum_sac.py"], | ||
| args = [ | ||
| "--num-agents=2", | ||
| "--num-learners=2", | ||
| ], | ||
| main = "examples/algorithms/sac/multi_agent_pendulum_sac.py", | ||
| tags = [ | ||
| "exclusive", | ||
| "learning_tests", | ||
| "learning_tests_continuous", | ||
| "team:rllib", | ||
| "torch_only", | ||
| ], | ||
| ) | ||
|
|
||
| py_test( | ||
| name = "learning_tests_multi_agent_pendulum_sac_multi_gpu", | ||
| size = "large", | ||
| timeout = "eternal", | ||
| srcs = ["examples/algorithms/sac/multi_agent_pendulum_sac.py"], | ||
| args = [ | ||
| "--num-agents=2", | ||
| "--num-learners=2", | ||
| "--num-gpus-per-learner=1", | ||
| ], | ||
| main = "examples/algorithms/sac/multi_agent_pendulum_sac.py", | ||
| main = "examples/algorithms/sac/mountaincar_sac.py", | ||
| tags = [ | ||
| "exclusive", | ||
| "learning_tests", | ||
| "learning_tests_continuous", | ||
| "multi_gpu", | ||
| "team:rllib", | ||
| "torch_only", | ||
| ], | ||
| ) | ||
|
Comment on lines
1709
to
1728
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is configured for multi-GPU execution (
num_learners=2,num_gpus_per_learner=1), but it's tagged withgpu. For consistency with other multi-GPU tests in this file (e.g.,learning_test_appo_tictactoe_multi_gpu), this should bemulti_gpu.