Fixes SB3's template ppo cfg up to date with security-safe syntax for training specification (#3688)

ooctipus · web-flow · commit 6131a573b63a · 2025-10-13T20:52:25.000-07:00
# Description This PR fixes the bug where if template is generated using SB3, the code does not run because it couldn't parse from string ``` policy_kwargs: "dict( activation_fn=nn.ELU, net_arch=[32, 32], squash_output=False, )" ``` We have disabled the string parsing, as it is not safe(aka arbitrary code could be parsed) this PR makes sure the sb3's template also adopt the new secure syntax ``` policy_kwargs: activation_fn: nn.ELU net_arch: [32, 32] squash_output: False ``` ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there
diff --git a/tools/template/templates/agents/sb3_ppo_cfg b/tools/template/templates/agents/sb3_ppo_cfg
@@ -11,11 +11,10 @@ n_epochs: 20
 ent_coef: 0.01
 learning_rate: !!float 3e-4
 clip_range: !!float 0.2
-policy_kwargs: "dict(
-                  activation_fn=nn.ELU,
-                  net_arch=[32, 32],
-                  squash_output=False,
-                )"
+policy_kwargs:
+  activation_fn: nn.ELU
+  net_arch: [32, 32]
+  squash_output: False
 vf_coef: 1.0
 max_grad_norm: 1.0
 device: "cuda:0"