Our current default training recipe is fairly arbitrary and untuned. We should do at least some tuning (current defaults in bold)
- batch size: 32, 64, 128, ...
- epochs: 100, 200, 400, ...
- learning rate: 5e-4, 1e-3, 2e-3, ...
Other settings that could be worth taking a look at:
- repeated sampling rate (
oversample)
- gradient clipping
- shuffle buffer size
Our current default training recipe is fairly arbitrary and untuned. We should do at least some tuning (current defaults in bold)
Other settings that could be worth taking a look at:
oversample)