Spiky CRMDP Roadmap

## Main road
- [x] Create toy environments
  - [x] Create new toy environments (@timorl's [repo](https://github.com/timorl/safety-gridworlds-gym))
  - [x] Clean up toy environments for use with Gym API
  - [x] Add toy environments as a dependency (#38)
  - [x] Debug toy environments (david-lindner/safe-grid-gym#15)
- [x] Refactor for use with Gym API (#32)
  - [x] Modify ai_safety_gridworlds_gym to fit our needs (@david-lindner's [fork](https://github.com/david-lindner/gym_ai_safety_gridworlds))
  - [x] Improve dependency management #31 
  - [x] Switch all code referencing envs to use Gym env
- [x] Improved tooling for hyperparameter tuning (e.g. Ray)
- [x] Estimate compute costs and finalize logistics
   - First guess for an upper bound: 1 agent x 4 environments x 3 experiments = 12 sets of hyperparameters to tune x ~30 training runs = 360 runs x 2 hours
- [ ] Do experiments **Start with experiments January 11**
  - [ ] Check if hparams tuned on Solver generalize to Cheater (vice versa too, but less important/rigorous)
- ~Investigate corrupt versions of harder environments~
  - ~Maybe bigger / more realistic boat race~
  - ~Maybe a modified Atari env~
  - ~Maybe a modified MuJoCo env~
  - ~Maybe modified BipedalWalker env~

**Finish experiments February 15**

**Deadline February 22**

## Environments:
- [x] TomatoWateringCRMDP
- [x] TransitionBoatRaceCRMDP
- [x] Toy environments
  - [x] corrupt corners (satisfies our assumptions for guaranteed learnability)
  - [x] corrupt path to goal (does not satisfy assumptions for guaranteed learnability)

## Experiments per env
- Baseline (learns corrupt reward)
- Cheater (learns with access to true reward)
- Solver (learns intended behavior from corrupt reward)

## Optional
- [ ] Generalize PPO #17
- [ ] Improve test coverage #29


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spiky CRMDP Roadmap #30

Main road

Environments:

Experiments per env

Optional

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Spiky CRMDP Roadmap #30

Description

Main road

Environments:

Experiments per env

Optional

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions