Merge pull request #53 from AI-Hypercomputer/raymondzou-collectives

raymondzouu · web-flow · commit 155beed8c678 · 2025-04-01T10:15:53.000-07:00
Add more info to the README for running multislice
diff --git a/microbenchmarks/trillium/collectives/README.md b/microbenchmarks/trillium/collectives/README.md
@@ -7,7 +7,7 @@ Please follow this [link](https://github.com/AI-Hypercomputer/tpu-recipes/blob/m
 
 ### Starting workload
 
-Launch the XPK workload:
+Launch the XPK workload, example to run on 1 slice of v6e-256:
 ```
 python3 ~/xpk/xpk.py workload create \
     --cluster=${CLUSTER_NAME} \
@@ -20,6 +20,11 @@ python3 ~/xpk/xpk.py workload create \
     --workload=${WORKLOAD_NAME}
 ```
 
+To run on more than 1 slice, modify the `--num_slices` and `--config` flags to use the target number of slices and the corresponding yaml config file e.g
+```
+--num_slices=2 --config=configs/2x_v6e_256.yaml 
+```
+
 From your workload logs, you should start seeing benchmark logs:
 ```
 psum_dcn: Matrix size: 17408x17408, dtype=<class 'jax.numpy.bfloat16'>, matrix_size_gbyte=0.606076928,achieved_bandwidth_gbyte_s=4.1130934137328214
@@ -31,4 +36,4 @@ Results will be printed out and also stored at `/tmp/microbenchmarks/collectives
 gsutil cp -r /tmp/microbenchmarks/collectives gs://<your-gcs-bucket>
 ```
 
-Check out the other scripts for running on more than 1 slice.
+