-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MLPerf logging #831
Add MLPerf logging #831
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good; left some comments for now. Will re-review once the open TODOs are addressed.
Co-authored-by: ravi-mosaicml <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really close! Just a few nits, plus one comment re: binding the train dataloader and evaluator to the state after Event.INIT
runs. Rational is that these parameters will not need be specified in Trainer.init(...)
after #948 lands (and algorithms / callbacks are already OK with no dataloaders or evaluators during Event.INIT
).
Co-authored-by: ravi-mosaicml <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 💯! See comments
Adds an experimental logger to create MLperf compliant submission files
This PR contributes a callback
MLPerfCallback
, which will create a submission directory and results file that is compliant with the MLPerf rules for Training v1.1 (e.g. it passes the mlperf logging's package checker).Upon usage, a submission folder structure will be created with the
root_folder
as the base and the following directories:For each training run, a results file will be created, e.g.
The entire directory can then be submitted to mlperf, and checked with the
package_checker
in https://github.com/mlcommons/logging.Currently this callback only supports the OPEN division benchmark.
This PR is gated by:
TODO:
Hparams
object