feat: Phase D — RL integration (training, benchmark, visualization) by Balghanimi · Pull Request #1 · Balghanimi/OpenSMC

Balghanimi · 2026-03-19T09:12:18Z

Summary

SurfaceDiscoveryEnv: Gymnasium environment where RL agent discovers sliding surfaces (sigma output, fixed switching law, 5 plants, 4 disturbance types)
Trainer: PPO/SAC training pipeline with VecNormalize, ISE tracking, batch training across all plant/algo/disturbance combinations
Benchmark: 580-simulation comparison — Table 1 (12 surfaces × 5 plants × 4 disturbances, fixed law) + Table 2 (17 controllers × 5 plants × 4 disturbances, best-matched)
Visualize: 7 publication-quality matplotlib functions (heatmap, contour overlay, radar, cross-plant radar, benchmark bars, training curve, time-domain)
RLDiscoveredSurface fixes: SAC loading fallback + 4D obs padding for SurfaceDiscoveryEnv-trained models

Stats

14 files changed, +1,931 lines
45 new tests (20 env + 9 trainer + 9 benchmark + 7 visualize)
220/221 tests pass (1 pre-existing PID numerical edge case)

Test plan

All 45 new tests pass
Full suite regression check (220 pass)
Import verification: from opensmc.rl import RLDiscoveredSurface, SurfaceDiscoveryEnv, train_surface, run_benchmark, visualize
Run examples/train_and_fingerprint.py end-to-end (PPO training + fingerprinting + plots)
Run examples/full_benchmark.py end-to-end (benchmark + LaTeX tables)

🤖 Generated with Claude Code

Covers SurfaceDiscoveryEnv, PPO/SAC trainer, benchmark suite (17 controllers x 5 plants), and visualization module. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixes from review: IntegralSlidingSurface params, Quadrotor disturbance format, exact switching law in benchmark, pytest skip patterns, Gym registration, unused import removed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

TDD: 20 tests written first (all failed), then implementation created. All 20 tests pass. No regressions in existing 193 tests. Gymnasium environment where RL agent outputs sigma (sliding variable) directly, and a fixed switching control law converts to control input. Supports 5 plants (double_integrator, inverted_pendulum, crane, quadrotor, pmsm), 4 disturbance types, RK4 integration, and configurable control gains. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Ensures consistency with plant dynamics for non-salient motors (Ld != Lq). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implements train_surface() and train_all_surfaces() with VecNormalize, ISE evaluation callback, and best-model saving. 9/9 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- _load_sb3 now tries PPO first then falls back to SAC on any exception - use_4d_obs parameter pads observations to [e, edot, 0.0, 0.0] for models trained with 4-element observation spaces - 2 new tests covering both behaviours (11 total, all passing) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Two-table benchmark comparing RL-discovered surfaces against all 17 classical controllers across 5 plants and 4 disturbance types: - Table 1 (Fixed Law): surfaces evaluated with u = -K*sat(s/phi) - lam*s using standalone RK4 loop matching training environment - Table 2 (Matched Controller): each controller uses its own compute() via Simulator.run() for practical performance comparison - BenchmarkResults dataclass with to_latex(), to_json(), summary() - 9/9 tests passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

7 plotting functions (training_curve, surface_heatmap, contour_overlay, radar_chart, cross_plant_radar, benchmark_bars, time_domain) with corresponding 7 pytest tests using Agg backend. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- opensmc/rl/__init__.py: lazy __getattr__ for SurfaceDiscoveryEnv, trainer (train_surface/train_all_surfaces/TrainingResult), benchmark/BenchmarkResults, and visualize module - pyproject.toml: add pandas>=1.5 to rl extras (required by benchmark) - examples/train_and_fingerprint.py: PPO training + fingerprint + plots - examples/full_benchmark.py: classical controller benchmark runner Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The __getattr__ pattern caused infinite recursion when attribute names matched submodule names (benchmark, visualize). Switched to try/except conditional imports. Renamed benchmark() to run_benchmark() in public API. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

a2z and others added 10 commits March 19, 2026 10:49

docs: add Phase D (RL integration) design spec

e9bae09

Covers SurfaceDiscoveryEnv, PPO/SAC trainer, benchmark suite (17 controllers x 5 plants), and visualization module. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use plant.electromagnetic_torque() in PMSM edot computation

524fba4

Ensures consistency with plant dynamics for non-salient motors (Ld != Lq). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add PPO/SAC surface training pipeline

11b140a

Implements train_surface() and train_all_surfaces() with VecNormalize, ISE evaluation callback, and best-model saving. 9/9 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Phase D — RL integration (training, benchmark, visualization)#1

feat: Phase D — RL integration (training, benchmark, visualization)#1
Balghanimi wants to merge 10 commits intomainfrom
feat/phase-d-rl-integration

Balghanimi commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Balghanimi commented Mar 19, 2026

Summary

Stats

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant