Skip to content

Spiky CRMDP Roadmap #30

@jvmncs

Description

@jvmncs

Main road

  • Create toy environments
  • Refactor for use with Gym API (Gym #32)
  • Improved tooling for hyperparameter tuning (e.g. Ray)
  • Estimate compute costs and finalize logistics
    • First guess for an upper bound: 1 agent x 4 environments x 3 experiments = 12 sets of hyperparameters to tune x ~30 training runs = 360 runs x 2 hours
  • Do experiments Start with experiments January 11
    • Check if hparams tuned on Solver generalize to Cheater (vice versa too, but less important/rigorous)
  • Investigate corrupt versions of harder environments
    • Maybe bigger / more realistic boat race
    • Maybe a modified Atari env
    • Maybe a modified MuJoCo env
    • Maybe modified BipedalWalker env

Finish experiments February 15

Deadline February 22

Environments:

  • TomatoWateringCRMDP
  • TransitionBoatRaceCRMDP
  • Toy environments
    • corrupt corners (satisfies our assumptions for guaranteed learnability)
    • corrupt path to goal (does not satisfy assumptions for guaranteed learnability)

Experiments per env

  • Baseline (learns corrupt reward)
  • Cheater (learns with access to true reward)
  • Solver (learns intended behavior from corrupt reward)

Optional

Metadata

Metadata

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions