I'm trying to find a good pattern for using pytorch lightning. Created some dummy tasks here. Things to consider:
- Need to tune by searching hyperparameter space.
- Have more than 1 training run per trial (hyperparameter set). E.g. fold1, fold2.
- Log summaries with (at least) best val loss achieved and hyperparameter set for all trials.
- Log epoch-wise training progress (at least train/val loss).
- Have multiple training stages (e.g. warmup rounds).
- Possibility to analyze gradients.
- Run in distributed environment.
- Evaluate dataloading on every step.
# run trainer
python e5_using_logkey/trainer.py
...
# start tensorboard
tensorboard serve --logdir e5_using_logkey/