Skip to content

Commit

Permalink
Add Pytorch section to README
Browse files Browse the repository at this point in the history
  • Loading branch information
Scott Davidson committed Oct 2, 2023
1 parent 7dcf327 commit dc27d35
Showing 1 changed file with 36 additions and 0 deletions.
36 changes: 36 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
- [RDMA Bandwidth](#rdma-bandwidth)
- [RDMA Latency](#rdma-latency)
- [fio](#fio)
- [Pytorch](#Pytorch)
- [Operator development](#operator-development)

## Installation
Expand Down Expand Up @@ -289,6 +290,41 @@ spec:
storage: 5Gi
```

### Pytorch

Runs machine learning model training and inference micro-benchmarks from the official
Pytorch [benchmarks repo](https://github.com/pytorch/benchmark/) to compare performance
of CPU and GPU devices on synthetic input data. Running benchmarks on CUDA-capable
devices requires the [Nvidia GPU Operator](https://github.com/NVIDIA/gpu-operator)
to be pre-installed on the target Kubernetes cluster.

The pre-built container image currently includes the `alexnet`, `resnet50` and
`llama` (inference only) models - additional models from the
[upstream repo list](https://github.com/pytorch/benchmark/tree/main/torchbenchmark/models)
may be added as needed in the future. (Adding a new model simply requires adding it to the list
in `images/pytorch-benchmark/Dockerfile` and updating the `PytorchModel` enum in `pytorch.py`.)

```yaml
apiVersion: perftest.stackhpc.com/v1alpha1
kind: Pytorch
metadata:
name: pytorch-test-gpu
spec:
# The device to run the benchmark on ('cpu' or 'cuda')
device: cuda
# Name of model to benchmark
model: alexnet
# Either 'train' or 'eval'
# (not all models support both)
benchmarkType: eval
# Batch size for generated input data
inputBatchSize: 32
# Count defaults to 0 for device == cpu
# or 1 for device == cuda
gpuCount: 2
```


## Operator development

```
Expand Down

0 comments on commit dc27d35

Please sign in to comment.