From dc27d354c8194fa3c74a75cb905f67a7bc8877f2 Mon Sep 17 00:00:00 2001
From: Scott Davidson <scott@stackhpc.com>
Date: Mon, 2 Oct 2023 16:24:37 +0100
Subject: [PATCH] Add Pytorch section to README

---
 README.md | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/README.md b/README.md
index 77dcadf..aa30b92 100644
--- a/README.md
+++ b/README.md
@@ -13,6 +13,7 @@
   - [RDMA Bandwidth](#rdma-bandwidth)
   - [RDMA Latency](#rdma-latency)
   - [fio](#fio)
+  - [Pytorch](#Pytorch)
 - [Operator development](#operator-development)
 
 ## Installation
@@ -289,6 +290,41 @@ spec:
         storage: 5Gi
 ```
 
+### Pytorch
+
+Runs machine learning model training and inference micro-benchmarks from the official 
+Pytorch [benchmarks repo](https://github.com/pytorch/benchmark/) to compare performance
+of CPU and GPU devices on synthetic input data. Running benchmarks on CUDA-capable
+devices requires the [Nvidia GPU Operator](https://github.com/NVIDIA/gpu-operator) 
+to be pre-installed on the target Kubernetes cluster.
+
+The pre-built container image currently includes the `alexnet`, `resnet50` and 
+`llama` (inference only) models - additional models from the 
+[upstream repo list](https://github.com/pytorch/benchmark/tree/main/torchbenchmark/models)
+may be added as needed in the future. (Adding a new model simply requires adding it to the list
+in `images/pytorch-benchmark/Dockerfile` and updating the `PytorchModel` enum in `pytorch.py`.)
+
+```yaml
+apiVersion: perftest.stackhpc.com/v1alpha1
+kind: Pytorch
+metadata:
+  name: pytorch-test-gpu
+spec:
+  # The device to run the benchmark on ('cpu' or 'cuda')
+  device: cuda
+  # Name of model to benchmark
+  model: alexnet
+  # Either 'train' or 'eval'
+  # (not all models support both)
+  benchmarkType: eval
+  # Batch size for generated input data
+  inputBatchSize: 32
+  # Count defaults to 0 for device == cpu
+  # or 1 for device == cuda
+  gpuCount: 2
+```
+
+
 ## Operator development
 
 ```