Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename package to algoperf #833

Open
wants to merge 14 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ jobs:
pip install .[pytorch_cpu]
- name: Run pytest tests
run: |
pytest -vx tests/version_test.py
pytest -vx tests/test_version.py
pytest -vx tests/test_num_params.py
pytest -vx tests/test_param_shapes.py
pytest -vx tests/test_param_types.py
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
pip install pylint==2.16.1
- name: Run pylint
run: |
pylint algorithmic_efficiency
pylint algoperf
pylint reference_algorithms
pylint prize_qualification_baselines
pylint submission_runner.py
Expand Down Expand Up @@ -50,7 +50,7 @@ jobs:
- name: Install yapf
run: |
python -m pip install --upgrade pip
pip install yapf==0.32
pip install yapf==0.32 toml
- name: Run yapf
run: |
yapf . --diff --recursive
2 changes: 1 addition & 1 deletion .github/workflows/regression_tests_variants.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ jobs:
run: |
docker pull us-central1-docker.pkg.dev/training-algorithms-external/mlcommons-docker-repo/algoperf_pytorch_${{ github.head_ref || github.ref_name }}
docker run -v $HOME/data/:/data/ -v $HOME/experiment_runs/:/experiment_runs -v $HOME/experiment_runs/logs:/logs --gpus all --ipc=host us-central1-docker.pkg.dev/training-algorithms-external/mlcommons-docker-repo/algoperf_pytorch_${{ github.head_ref || github.ref_name }} -d criteo1tb -f pytorch -s reference_algorithms/paper_baselines/adamw/pytorch/submission.py -w criteo1tb_resnet -t reference_algorithms/paper_baselines/adamw/tuning_search_space.json -e tests/regression_tests/adamw -m 10 -c False -o True -r false
criteo_resnet_pytorch:
criteo_embed_init_pytorch:
runs-on: self-hosted
needs: build_and_push_pytorch_docker_image
steps:
Expand Down
8 changes: 5 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ makefile
*.swp
*/data/
*events.out.tfevents*
algorithmic_efficiency/workloads/librispeech_conformer/data_dir
algorithmic_efficiency/workloads/librispeech_conformer/work_dir
algoperf/workloads/librispeech_conformer/data_dir
algoperf/workloads/librispeech_conformer/work_dir
*.flac
*.npy
*.csv
Expand All @@ -23,4 +23,6 @@ wandb/
scoring/plots/

!scoring/test_data/experiment_dir/study_0/mnist_jax/trial_0/eval_measurements.csv
!scoring/test_data/experiment_dir/study_0/mnist_jax/trial_1/eval_measurements.csv
!scoring/test_data/experiment_dir/study_0/mnist_jax/trial_1/eval_measurements.csv

algoperf/_version.py
20 changes: 13 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,39 @@

- Finalized variant workload targets.
- Fix in random_utils helper function.
- For conformer PyTorch Dropout layers set `inplace=True`.
- For conformer PyTorch Dropout layers set `inplace=True`.
- Clear CUDA cache at begining of each trial for PyTorch.

## algoperf-benchmark-0.1.4 (2024-03-26)

Upgrade CUDA version to CUDA 12.1:

- Upgrade CUDA version in Dockerfiles that will be used for scoring.
- Update Jax and PyTorch package version tags to use local CUDA installation.

Add flag for completely disabling checkpointing.
Add flag for completely disabling checkpointing.

- Note that we will run with checkpointing off at scoring time.

Update Deepspeech and Conformer variant target setting configurations.
- Note that variant targets are not final.
Update Deepspeech and Conformer variant target setting configurations.

- Note that variant targets are not final.

Fixed bug in scoring code to take best trial in a study for external-tuning ruleset.

Added instructions for submission.
Added instructions for submission.

Changed default number of workers for PyTorch data loaders to 0. Running with >0 may lead to incorrect eval results see https://github.com/mlcommons/algorithmic-efficiency/issues/732.
Changed default number of workers for PyTorch data loaders to 0. Running with >0 may lead to incorrect eval results see <https://github.com/mlcommons/algorithmic-efficiency/issues/732>.

## algoperf-benchmark-0.1.2 (2024-03-04)

Workload variant additions and fixes:

- Add Deepspeech workload variant
- Fix bugs in Imagenet ResNet, WMT and Criteo1tb variants

Add prize qualification logs for external tuning ruleset.
Note: FastMRI trials with dropout are not yet added due to https://github.com/mlcommons/algorithmic-efficiency/issues/664.
Note: FastMRI trials with dropout are not yet added due to <https://github.com/mlcommons/algorithmic-efficiency/issues/664>.

Add missing funcitonality to Docker startup script for self_tuning ruleset.
Add self_tuning ruleset option to script that runs all workloads for scoring.
Expand All @@ -41,6 +46,7 @@ Datasetup fixes.
Fix tests that check training differences in PyTorch and JAX on GPU.

## algoperf-benchmark-0.1.1 (2024-01-19)

Bug fixes to FastMRI metric calculation and targets.

Added workload variants and targets for ogbg, fastmri, librispeech_conformer, imagenet_resnet, imagenet_vit, criteo1tb to be used as held-out workloads.
Expand Down
19 changes: 16 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
- [Style Testing](#style-testing)
- [Unit and Integration Tests](#unit-and-integration-tests)
- [Regression Tests](#regression-tests)
- [Versioning](#versioning)

## Contributing to MLCommons

Expand Down Expand Up @@ -204,7 +205,7 @@ docker run -t -d \
-v $HOME/data/:/data/ \
-v $HOME/experiment_runs/:/experiment_runs \
-v $HOME/experiment_runs/logs:/logs \
-v $HOME/algorithmic-efficiency:/algorithmic-efficiency \
-v $HOME/algorithmic-efficiency:/algoperf \
--gpus all \
--ipc=host \
<image_path> \
Expand All @@ -228,7 +229,7 @@ To run the below commands, use the versions installed via `pip install -e '.[dev
To automatically fix formatting errors, run the following (*WARNING:* this will edit your code, so it is suggested to make a git commit first!):

```bash
yapf -i -r -vv -p algorithmic_efficiency datasets prize_qualification_baselines reference_algorithms tests *.py
yapf -i -r -vv -p algoperf datasets prize_qualification_baselines reference_algorithms tests *.py
```

To sort all import orderings, run the following:
Expand All @@ -246,7 +247,7 @@ isort . --check --diff
To print out all offending pylint issues, run the following:

```bash
pylint algorithmic_efficiency
pylint algoperf
pylint datasets
pylint prize_qualification_baselines
pylint reference_algorithms
Expand Down Expand Up @@ -276,3 +277,15 @@ To run a regression test:
2. Turn on the self-hosted runner.
3. Run the self-hosted runner application for the runner to accept jobs.
4. Open a pull request into mian to trigger the workflow.

### Versioning

The package version is automatically determined by the `setuptools_scm` package based on the last git tag.
It follows the structure `major.minor.patch` + `devN` where `N` is the number of commits since the last tag.
It automatically increments the patch version (i.e. it guesses the next version) if there are commits after the last tag.
Additionally, if there are uncommitted changes, the version will include a suffix separated by a `+` character and includes the last commit hash plus the date on dirt workdir (see [setuptools_scm's documentation](https://setuptools-scm.readthedocs.io/en/latest/extending/#setuptools_scmlocal_scheme) with the default version and local scheme).
You can check what version `setuptools_scm` is creating by running `python -m setuptools_scm`.

To create a new version, create a new release (and tag) in the GitHub UI.
The package version is automatically updated to the new version.
Once the package is installed, the version can be accessed as the package attribute `algoperf.__version__`, i.e. via `python -c "import algoperf; print(algoperf.__version__)"`.
5 changes: 2 additions & 3 deletions DOCUMENTATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,6 @@ def update_params(
- Cannot replace the model parameters with pre-trained ones.
- Batch norm should work here because the `model_fn` will return updated batch norm moving averages when it is told to with `update_batch_norm`.


###### Prepare for evaluation function

```python
Expand Down Expand Up @@ -278,7 +277,7 @@ def data_selection(

In general, with noisy, non-deterministic training, evaluation frequency can affect training time measurements as more "bites of the apple" potentially allows the training code to exploit instability. We also want to discourage submissions from complicated and unrealistic logic that attempts to guess when training is close to complete and increases the evaluation rate, while not producing a well-sampled training curve at the start of training. Simply allowing submissions complete freedom over evaluation frequency encourages competitors to work to minimize the number of evaluations, which distracts from the primary goal of finding better training algorithms.

Submissions are eligible for an untimed eval every `eval_period` seconds. Before proceeding to evaluation, the submission can prepare the model through a call to `prepare_for_eval`, effectively modifying the model parameters and state as well as the the optimizer state. Any additional evaluations performed by the submission code count against the runtime for scoring.
Submissions are eligible for an untimed eval every `eval_period` seconds. Before proceeding to evaluation, the submission can prepare the model through a call to `prepare_for_eval`, effectively modifying the model parameters and state as well as the the optimizer state. Any additional evaluations performed by the submission code count against the runtime for scoring.
The harness that runs the submission code will attempt to eval every `eval_period` seconds by checking between each submission step (call of `update_params`) whether it has been at least `eval_period` seconds since that last eval, if so, the submission is given the possibility to prepare for evaluation (through a timed call to `prepare_for_eval`). If the accumulated runtime does not exceed the maximum allowed runtime after the preparation step, the clock is paused, and the submission is evaluated. This means that if calls to `update_params` typically take a lot more than `eval_period` seconds, such submissions will not receive as many untimed evals as a submission that had an `update_params` function that took less time. However, for appropriate settings of `eval_period`, we expect this to be quite rare. Submissions are always free to restructure their `update_params` code to split work into two subsequent steps to regain the potential benefits of these untimed model evaluations. For each workload, the `eval_period` will be set such that the total evaluation time is roughly between 10% and 20% of the total training time for the target-setting runs.

#### Valid submissions
Expand Down Expand Up @@ -642,4 +641,4 @@ That said, while submitting Adam with some novel heuristic to set various hyperp
The JAX and PyTorch versions of the Criteo, FastMRI, Librispeech, OGBG, and WMT workloads use the same TensorFlow input pipelines. Due to differences in how JAX and PyTorch distribute computations across devices, the PyTorch workloads have an additional overhead for these workloads.

Since we use PyTorch's [`DistributedDataParallel`](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel) implementation, there is one Python process for each device. Depending on the hardware and the settings of the cluster, running a TensorFlow input pipeline in each Python process can lead to errors, since too many threads are created in each process. See [this PR thread](https://github.com/mlcommons/algorithmic-efficiency/pull/85) for more details.
While this issue might not affect all setups, we currently implement a different strategy: we only run the TensorFlow input pipeline in one Python process (with `rank == 0`), and [broadcast](https://pytorch.org/docs/stable/distributed.html#torch.distributed.broadcast) the batches to all other devices. This introduces an additional communication overhead for each batch. See the [implementation for the WMT workload](https://github.com/mlcommons/algorithmic-efficiency/blob/main/algorithmic_efficiency/workloads/wmt/wmt_pytorch/workload.py#L215-L288) as an example.
While this issue might not affect all setups, we currently implement a different strategy: we only run the TensorFlow input pipeline in one Python process (with `rank == 0`), and [broadcast](https://pytorch.org/docs/stable/distributed.html#torch.distributed.broadcast) the batches to all other devices. This introduces an additional communication overhead for each batch. See the [implementation for the WMT workload](https://github.com/mlcommons/algorithmic-efficiency/blob/main/algoperf/workloads/wmt/wmt_pytorch/workload.py#L215-L288) as an example.
17 changes: 10 additions & 7 deletions GETTING_STARTED.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
- [Docker Tips](#docker-tips)
- [Score your Submission](#score-your-submission)
- [Running workloads](#running-workloads)
- [Package your Submission code](#package-your-submission-code)
- [Package Logs for Self-Reporting Submissions](#package-logs-for-self-reporting-submissions)

## Set Up and Installation

Expand Down Expand Up @@ -56,7 +58,7 @@ To set up a virtual enviornment and install this repository
cd algorithmic-efficiency
```

3. Run the following pip3 install commands based on your chosen framework to install `algorithmic_efficiency` and its dependencies.
3. Run the following pip3 install commands based on your chosen framework to install `algoperf` and its dependencies.

For **JAX**:

Expand All @@ -80,7 +82,6 @@ To set up a virtual enviornment and install this repository
pip3 install -e '.[full]'
```


<details>
<summary>
Per workload installations
Expand Down Expand Up @@ -414,22 +415,24 @@ submission_folder/
```

Specifically we require that:

1. There exist subdirectories in the the submission folder named after the ruleset: `external_tuning` or `self_tuning`.
2. The ruleset subdirectories contain directories named according to
some identifier of the algorithm.
3. Each algorithm subdirectory contains a `submission.py` module. Additional helper modules are allowed if prefer to you organize your code into multiple files. If there are additional python packages that have to be installed for the algorithm also include a `requirements.txt` with package names and versions in the algorithm subdirectory.
2. The ruleset subdirectories contain directories named according to
some identifier of the algorithm.
3. Each algorithm subdirectory contains a `submission.py` module. Additional helper modules are allowed if prefer to you organize your code into multiple files. If there are additional python packages that have to be installed for the algorithm also include a `requirements.txt` with package names and versions in the algorithm subdirectory.
4. For `external_tuning` algorithms the algorithm subdirectory
should contain a `tuning_search_space.json`.

To check that your submission folder meets the above requirements you can run the `submissions/repo_checker.py` script.

## Package Logs for Self-Reporting Submissions

To prepare your submission for self reporting run:

```
```bash
python3 package_logs.py --experiment_dir <experiment_dir> --destination_dir <destination_dir>
```

The destination directiory will contain the logs packed in studies and trials required for self-reporting.
The destination directiory will contain the logs packed in studies and trials required for self-reporting.

**Good Luck!**
5 changes: 5 additions & 0 deletions algoperf/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""Algorithmic Efficiency."""

from ._version import version as __version__

__all__ = ["__version__"]
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@
from tensorflow.io import gfile # pytype: disable=import-error
import torch

from algorithmic_efficiency import spec
from algorithmic_efficiency.pytorch_utils import pytorch_setup
from algoperf import spec
from algoperf.pytorch_utils import pytorch_setup

_, _, DEVICE, _ = pytorch_setup()
CheckpointReturn = Tuple[spec.OptimizerState,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from torch.utils.data import DistributedSampler
from torch.utils.data import Sampler

from algorithmic_efficiency import spec
from algoperf import spec


def shard_and_maybe_pad_np(
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import jax.dlpack
import torch

from algorithmic_efficiency import spec
from algoperf import spec


def jax_to_pytorch(x: spec.Tensor, take_ownership: bool = False) -> spec.Tensor:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@
import psutil
import torch.distributed as dist

from algorithmic_efficiency import spec
from algorithmic_efficiency.pytorch_utils import pytorch_setup
from algoperf import spec
from algoperf.pytorch_utils import pytorch_setup

USE_PYTORCH_DDP, RANK, DEVICE, _ = pytorch_setup()

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import jax
from torch import nn

from algorithmic_efficiency import spec
from algoperf import spec


def pytorch_param_shapes(model: nn.Module) -> Dict[str, spec.ShapeTuple]:
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@
import torch
import torch.distributed as dist

from algorithmic_efficiency import spec
from algorithmic_efficiency.profiler import Profiler
from algorithmic_efficiency.workloads.librispeech_conformer.librispeech_pytorch.models import \
from algoperf import spec
from algoperf.profiler import Profiler
from algoperf.workloads.librispeech_conformer.librispeech_pytorch.models import \
BatchNorm as ConformerBatchNorm
from algorithmic_efficiency.workloads.librispeech_deepspeech.librispeech_pytorch.models import \
from algoperf.workloads.librispeech_deepspeech.librispeech_pytorch.models import \
BatchNorm as DeepspeechBatchNorm


Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@
import tensorflow as tf
import tensorflow_datasets as tfds

from algorithmic_efficiency import spec
from algorithmic_efficiency.data_utils import shard_and_maybe_pad_np
from algoperf import spec
from algoperf.data_utils import shard_and_maybe_pad_np


def preprocess_for_train(image: spec.Tensor,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,8 @@
from flax import linen as nn
import jax.numpy as jnp

from algorithmic_efficiency import spec
from algorithmic_efficiency.workloads.imagenet_resnet.imagenet_jax.models import \
ResNetBlock
from algoperf import spec
from algoperf.workloads.imagenet_resnet.imagenet_jax.models import ResNetBlock

ModuleDef = nn.Module

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,11 @@
import optax
import tensorflow_datasets as tfds

from algorithmic_efficiency import param_utils
from algorithmic_efficiency import spec
from algorithmic_efficiency.workloads.cifar.cifar_jax import models
from algorithmic_efficiency.workloads.cifar.cifar_jax.input_pipeline import \
create_input_iter
from algorithmic_efficiency.workloads.cifar.workload import BaseCifarWorkload
from algoperf import param_utils
from algoperf import spec
from algoperf.workloads.cifar.cifar_jax import models
from algoperf.workloads.cifar.cifar_jax.input_pipeline import create_input_iter
from algoperf.workloads.cifar.workload import BaseCifarWorkload


class CifarWorkload(BaseCifarWorkload):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,13 @@
import torch
from torch import nn

from algorithmic_efficiency import spec
from algorithmic_efficiency.init_utils import pytorch_default_init
from algorithmic_efficiency.workloads.imagenet_resnet.imagenet_pytorch.models import \
from algoperf import spec
from algoperf.init_utils import pytorch_default_init
from algoperf.workloads.imagenet_resnet.imagenet_pytorch.models import \
BasicBlock
from algorithmic_efficiency.workloads.imagenet_resnet.imagenet_pytorch.models import \
from algoperf.workloads.imagenet_resnet.imagenet_pytorch.models import \
Bottleneck
from algorithmic_efficiency.workloads.imagenet_resnet.imagenet_pytorch.models import \
conv1x1
from algoperf.workloads.imagenet_resnet.imagenet_pytorch.models import conv1x1


class ResNet(nn.Module):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,12 @@
from torchvision import transforms
from torchvision.datasets import CIFAR10

from algorithmic_efficiency import data_utils
from algorithmic_efficiency import param_utils
from algorithmic_efficiency import pytorch_utils
from algorithmic_efficiency import spec
from algorithmic_efficiency.workloads.cifar.cifar_pytorch.models import \
resnet18
from algorithmic_efficiency.workloads.cifar.workload import BaseCifarWorkload
from algoperf import data_utils
from algoperf import param_utils
from algoperf import pytorch_utils
from algoperf import spec
from algoperf.workloads.cifar.cifar_pytorch.models import resnet18
from algoperf.workloads.cifar.workload import BaseCifarWorkload

USE_PYTORCH_DDP, RANK, DEVICE, N_GPUS = pytorch_utils.pytorch_setup()

Expand Down
Loading
Loading