Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Energy consumption wrapper #65

Merged
merged 117 commits into from
May 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
117 commits
Select commit Hold shift + click to select a range
5dc0a29
feat: Energy consumption
iwishiwasaneagle Jun 22, 2023
23ca884
feat: Energy consumption
iwishiwasaneagle Jun 22, 2023
75c28be
feat: Energy consumption
iwishiwasaneagle Jun 22, 2023
ef6bb39
Merge branch 'master' into dev-energy
iwishiwasaneagle Nov 21, 2023
675947e
feat: DRL controlled drone going from A to hovering at B
iwishiwasaneagle Feb 22, 2024
17a6e79
chore: Add a fifth action that controls the scaling of the propeller …
iwishiwasaneagle Feb 22, 2024
48c45aa
Merge branch 'master' into dev-energy
iwishiwasaneagle Feb 22, 2024
e26e045
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 22, 2024
a5467e2
chore: Switch to using "unwrapped"
iwishiwasaneagle Feb 26, 2024
42f40dc
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Feb 26, 2024
d6f9879
chore: Import order
iwishiwasaneagle Feb 26, 2024
bc996eb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 26, 2024
5aae992
fix: Check for different env types
iwishiwasaneagle Feb 26, 2024
d24b766
fix: Check for different env types
iwishiwasaneagle Feb 26, 2024
847356e
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Feb 26, 2024
5371a38
chore: Further edits to energy example
iwishiwasaneagle Feb 26, 2024
556968c
chore: Try out sbx
iwishiwasaneagle Feb 26, 2024
e5453b6
Merge branch 'master' into dev-energy
iwishiwasaneagle Feb 26, 2024
28f728c
chore: Update types
iwishiwasaneagle Feb 26, 2024
f6f196c
fix: Legacy imports
iwishiwasaneagle Feb 26, 2024
11bc080
chore: Allow a reset state to be given within options
iwishiwasaneagle Mar 2, 2024
2e42527
feat: A working DRL example with plotting
iwishiwasaneagle Mar 2, 2024
f244822
feat: Use optuna for hyperparameter sweep in DRL example
iwishiwasaneagle Mar 11, 2024
3960e07
refactor: Example got too big, so split into sub-modules
iwishiwasaneagle Mar 11, 2024
34014e2
feat: A working DRL example with plotting
iwishiwasaneagle Mar 15, 2024
a5204d3
feat: Add wandb to example
iwishiwasaneagle Mar 15, 2024
8127e41
chore: Make sure num_timesteps is set properly
iwishiwasaneagle Mar 15, 2024
83606e4
chore: Make sure num_timesteps is set properly
iwishiwasaneagle Mar 15, 2024
6a8b455
chore: Make sure num_timesteps is set properly
iwishiwasaneagle Mar 15, 2024
3d9f508
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 15, 2024
1afdf85
perf: Improve State.normed and move into main repo
iwishiwasaneagle Mar 22, 2024
3271621
chore: Change example rewards scheme to be normalized
iwishiwasaneagle Mar 22, 2024
f6cd990
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 22, 2024
b145db8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 22, 2024
f3b724f
chore: Use subprocvecenv when n_envs > 1
iwishiwasaneagle Mar 22, 2024
6441dc2
fix: tensorflow-probability 0.24.0 not compatible with sbx 0.12.0
iwishiwasaneagle Mar 22, 2024
010ab83
chore: make Monitor log other info keywords
iwishiwasaneagle Mar 22, 2024
1de2e16
refactor: Rename to be more accurate
iwishiwasaneagle Mar 25, 2024
2be6be6
feat: LQR+DRL meta-controller framework
iwishiwasaneagle Mar 25, 2024
f82ffff
chore: Normalize reward by maximum action command T
iwishiwasaneagle Mar 25, 2024
ee96f4a
chore: Normalize reward by maximum action command T
iwishiwasaneagle Mar 25, 2024
fd66268
chore: Normalize reward by maximum action command T
iwishiwasaneagle Mar 25, 2024
5f5a6f7
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 25, 2024
e0d7690
chore: Play around with how the control is structured. Only let polic…
iwishiwasaneagle Mar 26, 2024
650996e
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 26, 2024
294cfbc
chore: Enable maximum time to be specified from CLI
iwishiwasaneagle Mar 26, 2024
dbb70f7
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 26, 2024
844d19b
chore: Enable maximum time to be specified from CLI
iwishiwasaneagle Mar 26, 2024
9e98d1c
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 26, 2024
04347f6
chore: Reduce observation space to x-y plane
iwishiwasaneagle Mar 26, 2024
c93781a
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 26, 2024
f3dcd89
chore: Include angular position and velocity in the observation
iwishiwasaneagle Mar 26, 2024
0216f3e
chore: Plot actions
iwishiwasaneagle Mar 26, 2024
94cbc0e
chore: Adjust observation shape to match observation
iwishiwasaneagle Mar 26, 2024
6490375
chore: Ensure fig.tight_layout() is called for plotting during callba…
iwishiwasaneagle Mar 26, 2024
803f256
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 26, 2024
769741b
chore: Set target z to env z
iwishiwasaneagle Mar 26, 2024
d8674e3
chore: use env.unwrapped pattern
iwishiwasaneagle Mar 26, 2024
019e678
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 26, 2024
8a8e3d5
chore: Ensure target is set before it is used within reset
iwishiwasaneagle Mar 26, 2024
80aad5f
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 26, 2024
6d7a036
chore: Adjust reward normalization
iwishiwasaneagle Mar 27, 2024
3d79c69
chore: Change step simulation time to be 3
iwishiwasaneagle Mar 27, 2024
ea93406
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 27, 2024
ab834d7
chore: Expose initial_state
iwishiwasaneagle Mar 27, 2024
5679bf1
fix: Reset to initial state z not current z
iwishiwasaneagle Mar 27, 2024
c1df545
chore: Move to relative control rather than absolute
iwishiwasaneagle Mar 27, 2024
6334782
chore: Reduce step sim time down to 0.25s
iwishiwasaneagle Mar 27, 2024
f930698
chore: Reduce eval episodes to 10
iwishiwasaneagle Mar 27, 2024
98bd549
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 27, 2024
71c75a3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 27, 2024
b07881e
chore: Log number of targets achieved
iwishiwasaneagle Mar 27, 2024
1d69852
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 27, 2024
915ec5f
chore: Increase dx magnitude
iwishiwasaneagle Mar 27, 2024
88154b6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 27, 2024
3ca0bc3
chore: The reward is dependent on the target z as well, so we have to…
iwishiwasaneagle Mar 28, 2024
4b2f8a9
chore: Include clip_range in CLI inputs
iwishiwasaneagle Mar 28, 2024
12e89d5
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 28, 2024
0b6d02a
chore: Reduce sub-env sim time to 0.1s
iwishiwasaneagle Mar 28, 2024
ab4eb7b
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 28, 2024
9ce5483
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 28, 2024
eaa10a4
chore: Increase dx magnitude
iwishiwasaneagle Mar 27, 2024
5345f45
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 27, 2024
fd287ce
chore: Include the next target in the observation
iwishiwasaneagle Mar 29, 2024
d9b97da
chore: Reduce sub-env sim time to 0.1s
iwishiwasaneagle Mar 28, 2024
aa582f1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 28, 2024
15d7f89
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 29, 2024
da18e46
feat: Additional eval logging
iwishiwasaneagle Mar 29, 2024
978d609
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 29, 2024
c0f4849
feat: Change LQR tuner to be velocity centric
iwishiwasaneagle Mar 29, 2024
f429dd1
fix: Re-added is_success logging
iwishiwasaneagle Mar 29, 2024
38b4a5e
feat: Add 2D plot to graphing callback
iwishiwasaneagle Mar 29, 2024
3afe0a2
feat: Swap to velocity-based action space
iwishiwasaneagle Mar 29, 2024
8ec4795
Merge branch 'dev-energy' of github.com:iwishiwasaneagle/jdrones into…
iwishiwasaneagle Mar 29, 2024
783177c
chore: plot targets
iwishiwasaneagle Mar 29, 2024
e8a4fb7
chore: Switch to a heading-velocity scheme and reintroduce the contro…
iwishiwasaneagle Apr 2, 2024
04ea745
fi: Switch to vector-based velocity method
iwishiwasaneagle Apr 3, 2024
14090b4
refactor: Remove old env and optuna sweeps
iwishiwasaneagle Apr 4, 2024
e8d6e71
feat: Add colors and circles to 2D position plot to show the targets,…
iwishiwasaneagle Apr 4, 2024
dcea513
fix: Enable policy net_arch for RecurrentPPO to be customizable from …
iwishiwasaneagle Apr 4, 2024
a265f65
chore: set squash_output to true when using SDE
iwishiwasaneagle Apr 4, 2024
32e96bd
fix: success distance is 1 not 1.5
iwishiwasaneagle Apr 4, 2024
d21b26d
feat: Multi-env DRL
iwishiwasaneagle Apr 5, 2024
7a2db4a
fix: Don't break the loop if a target is reached to ensure that time …
iwishiwasaneagle Apr 5, 2024
29bbe52
fix: Switch to SB3's DummyVecEnv to handle the multi-env logic
iwishiwasaneagle Apr 9, 2024
04d3e72
feat: Extend to any n env
iwishiwasaneagle Apr 9, 2024
703274e
fix: Properly handle collision reward
iwishiwasaneagle Apr 9, 2024
8bfc7a7
refactor: Handle rewards normalization in the top-level env
iwishiwasaneagle Apr 9, 2024
91209c0
feat: Number of sub-envs from CLI
iwishiwasaneagle Apr 9, 2024
db08c1a
chore: Rename env to be more representative of what it is
iwishiwasaneagle Apr 9, 2024
3a7833f
fix: Ensure eval env is the same as training env
iwishiwasaneagle Apr 9, 2024
a5c30e4
fix: Re-include time in the observation
iwishiwasaneagle Apr 9, 2024
a7ee23c
fix: Ensure observation is (1,X) in shape
iwishiwasaneagle Apr 9, 2024
370fbf1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 9, 2024
6040eca
chore: Pass total sim time to the mult DRL env
iwishiwasaneagle May 3, 2024
73bed0d
feat: tensorboard and optuna for the 3d drl example
iwishiwasaneagle May 3, 2024
e1cdc81
doc: quick readme for the drl example
iwishiwasaneagle May 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/examples/drl_3d_wp/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
log/*
16 changes: 16 additions & 0 deletions docs/examples/drl_3d_wp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
```bash
PYTHONPATH=$PWD python drl_3d_wp \
learn mlp \
-N 5000000 \
--net_arch_mlp_width 512 \
--net_arch_mlp_depth 4 \
--lr 0.0002 0.00001 .5 \
--n_envs 16 \
--wandb_project jdrones \
--batch_size 4096 \
--n_steps 1024 \
--vec_env_cls subproc \
--n_eval 10 -T 10 \
--clip_range 0.2 \
--n_sub_envs 2
```
2 changes: 2 additions & 0 deletions docs/examples/drl_3d_wp/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Copyright (c) 2024. Jan-Hendrik Ewers
# SPDX-License-Identifier: GPL-3.0-only
224 changes: 224 additions & 0 deletions docs/examples/drl_3d_wp/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
# Copyright (c) 2024. Jan-Hendrik Ewers
# SPDX-License-Identifier: GPL-3.0-only
"""
A simple example of how to use SB3's PPO algorithm to control a drone from A to B
and then to hover there until time
runs out. This uses a square error reward function.

Run
===

..code-block:: bash
PYTHONPATH=src python docs/examples/drl_hover_square_error.py

"""
import warnings

import click
import matplotlib
import torch as th
from callback import EvalCallbackWithMoreLogging
from callback import GraphingCallback
from drl_3d_wp.consts import DT
from drl_3d_wp.consts import LOG_PATH
from drl_3d_wp.consts import N_ENVS
from drl_3d_wp.consts import N_EVAL
from drl_3d_wp.consts import TENSORBOARD_PATH
from drl_3d_wp.consts import TOTAL_TIMESTEP
from drl_3d_wp.env import Multi_DRL_WP_Env_LQR
from drl_3d_wp.policies import ActorCriticDenseNetPolicy
from loguru import logger
from stable_baselines3.common.callbacks import CallbackList
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.utils import get_linear_fn
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.vec_env import SubprocVecEnv

warnings.filterwarnings("ignore", category=UserWarning)
matplotlib.use("Agg")
logger.info(f"Starting {__file__}")


def make_env(T: int = 10, N_envs: int = 2, sim_T: float = 10 * DT):
env = Multi_DRL_WP_Env_LQR(dt=DT, T=T, N_envs=N_envs, sim_T=sim_T)
env = Monitor(env)
return env


def build_callback(
total_timesteps: int,
eval_callback_cls=EvalCallbackWithMoreLogging,
eval_callback_kwargs=None,
make_vec_env_kwargs=None,
):
if eval_callback_kwargs is None:
eval_callback_kwargs = {}

if make_vec_env_kwargs is None:
make_vec_env_kwargs = {}
n_eval = eval_callback_kwargs.pop("n_eval", N_EVAL)
n_envs = eval_callback_kwargs.pop("n_envs", N_ENVS)
usual_kwargs = dict(
eval_freq=total_timesteps // (n_eval * n_envs),
n_eval_episodes=10,
deterministic=True,
verbose=1,
callback_after_eval=GraphingCallback(),
)

kwargs = eval_callback_kwargs | usual_kwargs

eval_env = make_vec_env(make_env, n_envs=10, **make_vec_env_kwargs)
eval_callback = eval_callback_cls(eval_env, **kwargs)
return eval_callback


def build_model(
*,
env,
net_arch_name,
lr,
batch_size,
clip_range,
use_sde,
n_steps,
device,
net_arch_mlp_width=None,
net_arch_mlp_depth=None,
net_arch_dense_layers=None,
net_arch_lstm_layers=None,
net_arch_lstm_hidden_size=None,
):
match net_arch_name:
case "mlp":
from stable_baselines3 import PPO as PPO_SB3

model = PPO_SB3(
"MlpPolicy",
device=device,
learning_rate=lr,
clip_range=clip_range,
batch_size=batch_size,
use_sde=use_sde,
n_steps=n_steps,
policy_kwargs=dict(
net_arch=[
net_arch_mlp_width,
]
* net_arch_mlp_depth,
squash_output=use_sde,
),
env=env,
verbose=0,
tensorboard_log=TENSORBOARD_PATH,
)
case "dense":
from stable_baselines3 import PPO as PPO_SB3

model = PPO_SB3(
ActorCriticDenseNetPolicy,
device=device,
learning_rate=lr,
clip_range=clip_range,
batch_size=batch_size,
use_sde=use_sde,
n_steps=n_steps,
policy_kwargs=dict(
net_arch=net_arch_dense_layers, squash_output=use_sde
),
env=env,
verbose=0,
tensorboard_log=TENSORBOARD_PATH,
)
case "recurrent":
from sb3_contrib import RecurrentPPO

model = RecurrentPPO(
"MlpLstmPolicy",
device=device,
policy_kwargs=dict(
lstm_hidden_size=net_arch_lstm_layers,
n_lstm_layers=net_arch_lstm_hidden_size,
net_arch=[
net_arch_mlp_width,
]
* net_arch_mlp_depth,
squash_output=use_sde,
),
learning_rate=lr,
clip_range=clip_range,
batch_size=batch_size,
use_sde=use_sde,
n_steps=n_steps,
env=env,
verbose=0,
tensorboard_log=TENSORBOARD_PATH,
)
case _:
raise Exception()
return model


@click.group()
def main():
logger.info(f"Is cuda available? {th.cuda.is_available()}")


@main.command("learn", context_settings={"show_default": True})
@click.option("--vec_env_cls", type=click.Choice(["dummy", "subproc"]), default="dummy")
@click.option("--batch_size", type=int, default=128)
@click.option("--n_steps", type=int, default=4096)
@click.option("--n_sub_envs", type=click.IntRange(min=1), default=2)
@click.option("--lr", nargs=3, default=(0.0003, 0.0003, 1))
@click.option("--clip_range", default=0.2, type=click.FloatRange(min=0.01))
@click.argument("net_arch_name", type=click.Choice(["mlp", "dense", "recurrent"]))
@click.option("-N", "--num_timesteps", type=int, default=TOTAL_TIMESTEP)
@click.option("--use_sde", is_flag=True, default=False)
@click.option("--net_arch_mlp_width", type=int, default=1024)
@click.option("--net_arch_mlp_depth", type=int, default=4)
@click.option("--net_arch_dense_layers", type=int, default=4)
@click.option("--net_arch_lstm_layers", type=int, default=1)
@click.option("--net_arch_lstm_hidden_size", type=int, default=256)
@click.option("--n_eval", type=int, default=N_EVAL)
@click.option("--n_envs", type=int, default=N_ENVS)
@click.option("--wandb_project", default=None, type=str)
@click.option("--device", type=click.Choice(["cpu", "cuda"]), default="cuda")
@click.option("-T", "--max_sim_time", type=click.IntRange(min=10), default=10)
def learn(
wandb_project, vec_env_cls, max_sim_time, n_eval, n_envs, n_sub_envs, **kwargs
):
N = kwargs.pop("num_timesteps")
kwargs["lr"] = get_linear_fn(*kwargs.get("lr"))
env = make_vec_env(
make_env,
n_envs=n_envs,
vec_env_cls=DummyVecEnv if vec_env_cls == "dummy" else SubprocVecEnv,
env_kwargs=dict(T=max_sim_time, N_envs=n_sub_envs),
)
model = build_model(env=env, **kwargs)
callback = build_callback(
N,
eval_callback_kwargs=dict(n_eval=n_eval, n_envs=n_envs),
make_vec_env_kwargs=dict(env_kwargs=dict(T=max_sim_time, N_envs=n_sub_envs)),
)

if wandb_project is not None:
import wandb
from wandb.integration.sb3 import WandbCallback

wandb.init(
project=wandb_project,
dir=LOG_PATH,
sync_tensorboard=True,
tags=[vec_env_cls, kwargs.get("net_arch_name")],
monitor_gym=True,
save_code=True,
)
callback = CallbackList([callback, WandbCallback()])

model.learn(total_timesteps=N, progress_bar=True, callback=callback)
model.save(LOG_PATH)


main()
Loading
Loading