diff --git a/README.md b/README.md
index 2e9f121b7..72c7afad4 100644
--- a/README.md
+++ b/README.md
@@ -25,6 +25,16 @@ We also evaluated PatrickStar v0.4.3 on a single node of A100 SuperPod. It is ab
 
 Detail benchmark results on WeChat AI data center as well as NVIDIA SuperPod are posted on this [Google Doc](https://docs.google.com/spreadsheets/d/136CWc_jA_2zC4h1r-6dzD4PrOvp6aw6uCDchEyQv6sE/edit?usp=sharing).
 
+
+Scale PatrickStar to multiple machine (node) on SuperPod.
+We succeed to train a GPT3-175B on 32 GPU. As far as we known, it is the first work
+to run GPT3 on such small GPU cluster.
+Microsoft used 10,000 V100 to pertrain GPT3.
+Now you can finetune it or even pretrain your own one on 32 A100 GPU, amazing!
+
+![alt perf](./doc/m_node_superpod.png "performance testing result on multiple Node of  SuperNode")
+
+
 We've also trained the [CLUE-GPT2](https://huggingface.co/uer/gpt2-chinese-cluecorpussmall) model with PatrickStar, the loss and accuracy curve is shown below:
 
 ![CLUE-GPT2](./doc/clue-gpt2-loss-n-acc.png)
diff --git a/doc/m_node_superpod.png b/doc/m_node_superpod.png
new file mode 100644
index 000000000..cd516c0e9
Binary files /dev/null and b/doc/m_node_superpod.png differ
diff --git a/doc/one_node_perf_a100.png b/doc/one_node_perf_a100.png
index 7cdbd9202..ae5c3d4b3 100644
Binary files a/doc/one_node_perf_a100.png and b/doc/one_node_perf_a100.png differ
diff --git a/examples/run_transformers.sh b/examples/run_transformers.sh
index 958a1baa2..ee6e92511 100644
--- a/examples/run_transformers.sh
+++ b/examples/run_transformers.sh
@@ -28,7 +28,7 @@ export MEM_PROF=${MEM_PROF:-0}
 # asyn memory monitor for mem sampler
 export AMM=${AMM:-1}
 # mem saving comm
-export MSC=${MSC:-0}
+export MSC=${MSC:-1}
 # mem caching comm
 export CACHE=${CACHE:-1}
 # async move