Skip to content

feat: Phase D — RL integration (training, benchmark, visualization)#1

Open
Balghanimi wants to merge 10 commits intomainfrom
feat/phase-d-rl-integration
Open

feat: Phase D — RL integration (training, benchmark, visualization)#1
Balghanimi wants to merge 10 commits intomainfrom
feat/phase-d-rl-integration

Conversation

@Balghanimi
Copy link
Copy Markdown
Owner

Summary

  • SurfaceDiscoveryEnv: Gymnasium environment where RL agent discovers sliding surfaces (sigma output, fixed switching law, 5 plants, 4 disturbance types)
  • Trainer: PPO/SAC training pipeline with VecNormalize, ISE tracking, batch training across all plant/algo/disturbance combinations
  • Benchmark: 580-simulation comparison — Table 1 (12 surfaces × 5 plants × 4 disturbances, fixed law) + Table 2 (17 controllers × 5 plants × 4 disturbances, best-matched)
  • Visualize: 7 publication-quality matplotlib functions (heatmap, contour overlay, radar, cross-plant radar, benchmark bars, training curve, time-domain)
  • RLDiscoveredSurface fixes: SAC loading fallback + 4D obs padding for SurfaceDiscoveryEnv-trained models

Stats

  • 14 files changed, +1,931 lines
  • 45 new tests (20 env + 9 trainer + 9 benchmark + 7 visualize)
  • 220/221 tests pass (1 pre-existing PID numerical edge case)

Test plan

  • All 45 new tests pass
  • Full suite regression check (220 pass)
  • Import verification: from opensmc.rl import RLDiscoveredSurface, SurfaceDiscoveryEnv, train_surface, run_benchmark, visualize
  • Run examples/train_and_fingerprint.py end-to-end (PPO training + fingerprinting + plots)
  • Run examples/full_benchmark.py end-to-end (benchmark + LaTeX tables)

🤖 Generated with Claude Code

a2z and others added 10 commits March 19, 2026 10:49
Covers SurfaceDiscoveryEnv, PPO/SAC trainer, benchmark suite
(17 controllers x 5 plants), and visualization module.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes from review: IntegralSlidingSurface params, Quadrotor disturbance
format, exact switching law in benchmark, pytest skip patterns, Gym
registration, unused import removed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TDD: 20 tests written first (all failed), then implementation created.
All 20 tests pass. No regressions in existing 193 tests.

Gymnasium environment where RL agent outputs sigma (sliding variable)
directly, and a fixed switching control law converts to control input.
Supports 5 plants (double_integrator, inverted_pendulum, crane,
quadrotor, pmsm), 4 disturbance types, RK4 integration, and
configurable control gains.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensures consistency with plant dynamics for non-salient motors (Ld != Lq).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements train_surface() and train_all_surfaces() with VecNormalize,
ISE evaluation callback, and best-model saving. 9/9 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- _load_sb3 now tries PPO first then falls back to SAC on any exception
- use_4d_obs parameter pads observations to [e, edot, 0.0, 0.0] for
  models trained with 4-element observation spaces
- 2 new tests covering both behaviours (11 total, all passing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two-table benchmark comparing RL-discovered surfaces against all 17
classical controllers across 5 plants and 4 disturbance types:
- Table 1 (Fixed Law): surfaces evaluated with u = -K*sat(s/phi) - lam*s
  using standalone RK4 loop matching training environment
- Table 2 (Matched Controller): each controller uses its own compute()
  via Simulator.run() for practical performance comparison
- BenchmarkResults dataclass with to_latex(), to_json(), summary()
- 9/9 tests passing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7 plotting functions (training_curve, surface_heatmap, contour_overlay,
radar_chart, cross_plant_radar, benchmark_bars, time_domain) with
corresponding 7 pytest tests using Agg backend.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- opensmc/rl/__init__.py: lazy __getattr__ for SurfaceDiscoveryEnv,
  trainer (train_surface/train_all_surfaces/TrainingResult),
  benchmark/BenchmarkResults, and visualize module
- pyproject.toml: add pandas>=1.5 to rl extras (required by benchmark)
- examples/train_and_fingerprint.py: PPO training + fingerprint + plots
- examples/full_benchmark.py: classical controller benchmark runner

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The __getattr__ pattern caused infinite recursion when attribute names
matched submodule names (benchmark, visualize). Switched to try/except
conditional imports. Renamed benchmark() to run_benchmark() in public API.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant