-
Notifications
You must be signed in to change notification settings - Fork 46
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
RCP update: Resnet 64K, Unet3D, BERT (#119)
* First RCP checker commit, just a small commit with a README file to make sure the github + forking flow work for me. * Rcp_checker implementation: * Added 1.0.0/rcps.json file. Still in progress as RCPs have not been finalized * Code is in rcp_checker.py. This currently contains a single RCP_Checker class with functions to consume json file, construct the RCP structure, compute means, stdevs, and min allowed speedups, find RCPs based on benchmark and batch size, and generate interpolated RCPs. This is all the processing needed to happen at startup. No support yet to process and evaluate submission runs. This is TBD * __main__.py run a couple of simple tests, this will be moved eventually to a separate test file. * Added a few more 1.0.0 RCPs (still in progress) Added submission directory processing and comparison to RCPs in rcp_checker. Connected RCP checker to the result_summarizer. Fixed a couple of bugs. * Added remaining RCPs (resnet, bert, rnnt, unet3d), and fixed ones already in (maskrcnn, dlrm, ssd). Made a few fixes suggested by Victor * Update mlperf_logging/rcp_checker/README.md Co-authored-by: Marek Wawrzos <[email protected]> * One step closer to v1.0.0 * System Description Checker: - Updated to 1.0.0 * Package Checker: - Added support for 1.0.0 - Added calls to the RCP checker - Added call to the system description checker - Added support for the Unet3d olympic scoring (reject top and bottom 4) * Results Summarizer - Added support for 1.0.0 - Refactored olympic scoring calculation to be able to accommodate unet3d (reject top and bottom 4) - Made a couple of fixes to RCP checker interface and disabled RCP checks for minigo. * RCP Checker - Split monolithic RCP json file into 1 json file / benchmark. This improves readability and makes adding more RCPs easier - Added support for Unet3D RCP checking: Reject top and bottom 4 scores instead of 1 - Added verbose mode to assist submitters with debugging - Fixed a couple of bugs I found after previous PR was merged. * Documentation: - Updated README files for RCP checker, results summarizer, package checker and system description checker * Fixed suggested by Marek. * Fixed a bug in the RCP checker Updated max compile time rulw for 20mins to 30mins. Removed a print statement from the result_summarizer. * Logging 1.0.0 fixes based on some testing and more knowledge on submission procedure 1. Added --rcp_bypass command line flag in package checker. Submitter can use it to allow uploading of benchmarks that fail the RCP test. This is a package checker flag that is propagated to the RCP checker. It has no meaning using it on a standalone RCP checker run, as the package checker outputs controls whether a submission is valid. 2. Removed RCP checker from result_summarizer. It does not need to run there as it is called by the package checker. 3. Fixes for open submission: Do not call the seed checker, nor the RCP checker. Fixed a bug where open_common was including closed_<benchmark> rules. Since submitters in the open category can now use their own convergence rules I removed the convergence rules used in v0.7. So now the only rules for open submissions are the number of runs and open_common compliance rules. * Forgot to add verifier 1.0 top-level script in my previous commit. * Fixed failures pointed by Shang: - Line can start with :::MLLOG but it islegal to have anything else before :::MLLOG - Opened log files as latin-1, just like the compliance checker * Added Resnet temporary RCP for B=64K. The RCP was derived by Google's 0.7 tpu-v3-8192-TF submission and the 5 runs were duplicated Updated Unet3D RCPs. * Fixed RCP checker bug: When there were non-converging runs, the mean epochs to converge for the submission was under-reported. Updated compliance README file to 1.0.0 * Added final Resnet 64K RCP and updated 8K RCP. * Updated RCPs for Bert: Removed 768, added 1536 and updated 3072. Co-authored-by: Marek Wawrzos <[email protected]>
- Loading branch information
Showing
6 changed files
with
77 additions
and
55 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,3 +23,5 @@ | |
|
||
if not valid: | ||
sys.exit(1) | ||
else: | ||
print('SUCCESS') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters