Skip to content

Commit

Permalink
updating artifact evaluation docs (#1273)
Browse files Browse the repository at this point in the history
  • Loading branch information
gfursin committed Jul 22, 2024
2 parents ea75e5e + a84a27b commit 433a733
Show file tree
Hide file tree
Showing 14 changed files with 33 additions and 1,012 deletions.
1 change: 0 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ listed here in alphabetical order:
* Nicolas Essayan
* Justin Faust (One Stop Systems)
* Diane Feddema (Red Hat)
* Grigori Fursin (cTuning.org and cKnowledge.org)
* Leonid Fursin (United Silicon Carbide)
* Anirban Ghosh (Nvidia)
* James Goel (Qualcomm)
Expand Down
22 changes: 9 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ across diverse and continuously changing models, data, software and hardware.

CK consists of several ongoing sub-projects:

* [Collective Mind framework (CM)](cm) (*~1MB*) - a very light-weight Python-based framework with minimal dependencies
* [Collective Mind framework (CM)](cm) - a very light-weight Python-based framework with minimal dependencies
to help users implement, share and reuse cross-platform automation recipes to
build, benchmark and optimize applications on any platform
with any software and hardware. CM attempts to extends the `cmake` concept
Expand All @@ -30,26 +30,16 @@ CK consists of several ongoing sub-projects:
and the [ACM REP'23 keynote](https://doi.org/10.5281/zenodo.8105339).


* [CM4MLOPS: CM automation recipes for MLOps, MLPerf and DevOps](https://github.com/mlcommons/cm4mlops) (*~6MB*) -
* [CM4MLOPS](https://github.com/mlcommons/cm4mlops) -
a collection of portable, extensible and technology-agnostic automation recipes
with a human-friendly interface (aka CM scripts) to unify and automate all the manual steps required to compose, run, benchmark and optimize complex ML/AI applications
on diverse platforms with any software and hardware: see [online cKnowledge catalog](https://access.cknowledge.org/playground/?action=scripts),
[online MLCommons catalog](https://docs.mlcommons.org/cm4mlops/scripts)
and [source code](https://github.com/mlcommons/cm4mlops/blob/master/script).

* [CM automation recipes to reproduce research projects](https://github.com/ctuning/cm4research) (*~1MB*) - a unified CM interface to help researchers
and engineers access, prepare and run diverse research projects and make it easier to validate them in the real world
across rapidly evolving models, data, software and hardware
(see [our reproducibility initatives](https://cTuning.org/ae)
and [motivation](https://www.youtube.com/watch?v=7zpeIVwICa4) behind this project).

* [CM automation recipes for ABTF](https://github.com/mlcommons/cm4abtf) (*~1MB*) - a unified CM interface and automation recipes
* [CM4ABTF](https://github.com/mlcommons/cm4abtf) - a unified CM interface and automation recipes
to run automotive benchmark across different models, data sets, software and hardware from different vendors.

* [Modular C++ harness for MLPerf loadgen](https://github.com/mlcommons/cm4mlops/tree/main/script/app-mlperf-inference-mlcommons-cpp)

* [Modular Python harness for MLPerf loadgen](https://github.com/mlcommons/cm4mlops/tree/main/script/app-loadgen-generic-python)

* [Collective Knowledge Playground](https://access.cKnowledge.org) - an external platform being developed by [cKnowledge](https://cKnowledge.org)
to list CM scripts similar to PYPI, aggregate AI/ML Systems benchmarking results in a reproducible format with CM workflows,
and organize [public optimization challenges and reproducibility initiatives](https://access.cknowledge.org/playground/?action=challenges)
Expand All @@ -72,6 +62,12 @@ We are preparing new projects based on user feedback:

[Apache 2.0](LICENSE.md)

### Copyright

* Copyright (c) 2021-2024 MLCommons
* Copyright (c) 2014-2021 cTuning foundation


### Documentation

**MLCommons is updating the CM documentation based on user feedback - please check stay tuned for more details**.
Expand Down
246 changes: 1 addition & 245 deletions docs/artifact-evaluation/checklist.md
Original file line number Diff line number Diff line change
@@ -1,245 +1 @@
[ [Back to index](https://cTuning.org/ae) ]

# Artifact Checklist


Here we provide a few informal suggestions to help you fill in the
[Unified Artifact Appendix with the Reproducibility Checklist](https://github.com/mlcommons/ck/blob/master/docs/artifact-evaluation/template/ae.tex)
for artifact evaluation while avoiding common pitfalls.
We've introduced this appendix to [unify the description of experimental setups and results across different conferences](https://learning.acm.org/techtalks/reproducibility).




## Abstract

Briefly and informally describe your artifacts including minimal hardware, software and other requirements,
how they support your paper and what are they key results to be reproduced.
Note that evaluators will use artifact abstracts to bid on artifacts.
The AE chairs will also use it to finalize artifact assignments.


## Checklist


Together with the artifact abstract, this check-list will help us make sure that evaluators
have appropriate competency and an access to the technology required to evaluate your artifacts.
It can also be used as meta information to find your artifacts in Digital Libraries.

![](https://raw.githubusercontent.com/mlcommons/ck/master/docs/artifact-evaluation/image-general-workflow1.png)


Fill in whatever is applicable with some informal keywords and remove unrelated items
(please consider questions below just as informal hints
that reviewers are usually concerned about):


* **Algorithm:** Are you presenting a new algorithm?
* **Program:** Which benchmarks do you use
([PARSEC](http://parsec.cs.princeton.edu "http://parsec.cs.princeton.edu"),
[NAS](http://www.nas.nasa.gov/publications/npb.html "http://www.nas.nasa.gov/publications/npb.html"),
[EEMBC](https://www.eembc.org "https://www.eembc.org"),
[SPLASH](http://www.capsl.udel.edu/splash/index.html "http://www.capsl.udel.edu/splash/index.html"),
[Rodinia](https://www.cs.virginia.edu/~skadron/wiki/rodinia "https://www.cs.virginia.edu/~skadron/wiki/rodinia"),
[LINPACK](http://www.netlib.org/linpack "http://www.netlib.org/linpack"),
[HPCG](http://hpcg-benchmark.org/ "http://hpcg-benchmark.org/"),
[MiBench](http://wwweb.eecs.umich.edu/mibench "http://wwweb.eecs.umich.edu/mibench"),
[SPEC](https://www.spec.org/cpu2006 "https://www.spec.org/cpu2006"),
[cTuning](http://github.com/ctuning/ctuning-programs "http://github.com/ctuning/ctuning-programs"), etc)?
Are they included or should they be downloaded? Which version?
Are they public or private? If they are private,
is there a public analog to evaluate your artifact?
What is the approximate size?
* **Compilation:** Do you require a specific compiler? Public/private? Is it included? Which version?
* **Transformations:** Do you require a program transformation tool (source-to-source, binary-to-binary, compiler pass, etc)?
Public/private? Is it included? Which version?
* **Binary:** Are binaries included? OS-specific? Which version?
* **Model:** Do you use specific models (GPT-J, BERT, MobileNets ...)?
Are they included? If not, how to download and install?
What is their approximate size?
* **Data set:** Do you use specific data sets?
Are they included? If not, how to download and install?
What is their approximate size?
* **Run-time environment:** Is your artifact OS-specific (Linux, Windows, MacOS, Android, etc) ?
Which version? Which are the main software dependencies (JIT, libs, run-time adaptation frameworks, etc);
Do you need root access?
* **Hardware:** Do you need specific hardware (supercomputer, architecture simulator, CPU, GPU, neural network accelerator, FPGA)
or specific features (hardware counters
to measure power consumption, SUDO access to CPU/GPU frequency, etc)?
Are they publicly available?
* **Run-time state:** Is your artifact sensitive to run-time state (cold/hot cache, network/cache contentions, etc.)
* **Execution:** Any specific conditions should be met during experiments (sole user, process pinning, profiling, adaptation, etc)? How long will it approximately run?
* **Metrics:** Which metrics will be evaluated (execution time, inference per second, Top1 accuracy, power consumption, etc).
* **Output:** What is the output of your key experiments (console, file, table, graph) and what are your key results
(exact output, numerical results, empirical characteristics, etc)?
Are expected results included?
* **Experiments:** How to prepare experiments and reproduce results
(README, scripts, [IPython/Jupyter notebook](https://jupyter.org "https://jupyter.org"),
[MLCommons CM automation language](https://doi.org/10.5281/zenodo.8105339), containers etc)?
Do not forget to mention the maximum allowable variation of empirical results!
* **How much disk space required (approximately)?:** This can help evaluators and end-users to find appropriate resources.
* **How much time is needed to prepare workflow (approximately)?:** This can help evaluators and end-users to estimate resources needed to evaluate your artifact.
* **How much time is needed to complete experiments (approximately)?:** This can help evaluators and end-users to estimate resources needed to evaluate your artifact.
* **Publicly available?:** Will your artifact be publicly available? If yes, we may spend an extra effort to help you with the documentation.
* **Code licenses (if publicly available)?:** If you workflows and artifacts will be publicly available, please provide information about licenses.
This will help the community to reuse your components.
* **Code licenses (if publicly available)?:** If you workflows and artifacts will be publicly available, please provide information about licenses.
This will help the community to reuse your components.
* **Workflow frameworks used?** Did authors use any workflow framework which can automate and customize experiments?
* **Archived?:**
Note that the author-created artifacts relevant to this paper
will receive the ACM "artifact available" badge \*only if\*
they have been placed on a publicly
accessible archival repository such as [Zenodo](https://zenodo.org "https://zenodo.org"),
[FigShare](https://figshare.com "https://figshare.com")
or [Dryad](http://datadryad.org "http://datadryad.org").
A DOI will be then assigned to their artifacts and must be provided here!
Personal web pages, Google Drive, GitHub, GitLab and BitBucket
are not accepted for this badge.
Authors can provide the DOI for their artifacts at the end of the evaluation.




## Description



### How to access



Describe the way how reviewers will access your artifacts:

* Clone a repository from GitHub, GitLab or any similar service
* Download a package from a public website
* Download a package from a private website (you will need to send information how to access your artifacts to AE chairs)
* Access artifact via private machine with pre-installed software (only when access to rare or publicly unavailable hardware is required or proprietary
software is used - you will need to send credentials to access your machine to the AE chairs)



Please describe approximate disk space required after unpacking your artifact.


### Hardware dependencies



Describe any specific hardware and specific features required to evaluate your artifact
(vendor, CPU/GPU/FPGA, number of processors/cores, interconnect, memory,
hardware counters, etc).


### Software dependencies



Describe any specific OS and software packages required to evaluate your
artifact. This is particularly important if you share your source code
and it must be compiled or if you rely on some proprietary software that you
can not include to your package. In such case, we strongly suggest you
to describe how to obtain and to install all third-party software, data sets
and models.




*Note that we are trying to obtain AE licenses for some commonly used proprietary tools
and benchmarks - you will be informed in case of positive outcome.*

### Data sets



If third-party data sets are not included in your packages (for example,
they are very large or proprietary), please provide details about how to download
and install them.

*In case of proprietary data sets, we suggest you provide reviewers
a public alternative subset for evaluation*.


### Models



If third-party models are not included in your packages (for example,
they are very large or proprietary), please provide details about how to download
and install them.




## Installation



Describe the setup procedures for your artifact (even when containers are used).



## Experiment workflow



Describe the experimental workflow and how it is implemented
and executed, i.e. some OS scripts,
[IPython/Jupyter notebook](https://jupyter.org "https://jupyter.org"),
[MLCommons CM automation language](https://github.com/mlcommons/ck/tree/master/docs), etc.

Check [examples of reproduced papers](https://cknow.io/reproduced-papers "https://cknow.io/reproduced-papers").





## Evaluation and expected result



Describe all the steps necessary to reproduce the key results from your paper.
Describe expected results including maximum allowable variation
of empirical results.
See the [SIGPLAN Empirical Evaluation Guidelines](https://www.sigplan.org/Resources/EmpiricalEvaluation "https://www.sigplan.org/Resources/EmpiricalEvaluation"),
the [NeurIPS reproducibility checklist](https://www.cs.mcgill.ca/~jpineau/ReproducibilityChecklist.pdf "https://www.cs.mcgill.ca/~jpineau/ReproducibilityChecklist.pdf")
and the [AE FAQ](faq.md) for more details.



## Experiment customization



It is optional but can be useful for the community if you describe all the knobs
to customize and tune your experiments and maybe even trying them
with a different data sets, benchmark/applications,
machine learning models, software environment (compilers, libraries,
run-time systems) and hardware.


## Reusability

Please describe your experience if you decided to participate in our pilot project to add
the non-intrusive [MLCommons Collective Mind interface (CM)](https://doi.org/10.5281/zenodo.8105339)
to your artifacts. Note that it will be possible to prepare and run your experiments with
or without this interface!



## Notes



You can add informal notes to draw the attention of evaluators.



----

*This document was prepared by [Grigori Fursin](https://cKnowledge.org/gfursin)
with contributions from [Bruce Childers](https://people.cs.pitt.edu/~childers),
[Michael Heroux](https://www.sandia.gov/~maherou),
[Michela Taufer](https://gcl.cis.udel.edu/personal/taufer) and other great colleagues.
It is maintained by the [cTuning foundation](https://cTuning.org/ae) and the
[MLCommons taskforce on automation and reproducibility](https://github.com/mlcommons/ck/blob/master/docs/taskforce.md).*
***Moved to https://github.com/ctuning/artifact-evaluation/blob/master/docs/checklist.md***
Loading

0 comments on commit 433a733

Please sign in to comment.