Skip to content

Commit 57dc813

Browse files
committedJul 11, 2024
Periodic update to overall documentation to account for latest Graviton4
release.
1 parent 7e35711 commit 57dc813

16 files changed

+102
-71
lines changed
 

‎CommonNativeJarsTable.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Org | jar | Builds on Arm | Arm Artifact available | Minimum Version
55
com.github.luben | [zstd-jni](https://github.com/luben/zstd-jni) | yes | [yes](https://mvnrepository.com/artifact/com.github.luben/zstd-jni) | 1.2.0
66
org.lz4 | [lz4-java](https://github.com/lz4/lz4-java) | yes | [yes](https://mvnrepository.com/artifact/org.lz4/lz4-java) | 1.4.0
77
org.xerial.snappy | [snappy-java](https://github.com/xerial/snappy-java) | yes | [yes](https://mvnrepository.com/artifact/org.xerial.snappy/snappy-java) | 1.1.4
8-
org.rocksdb | [rocksdbjni](https://github.com/facebook/rocksdb/tree/master/java) | yes | [yes](https://mvnrepository.com/artifact/org.rocksdb/rocksdbjni) | 5.0.1
8+
org.rocksdb | [rocksdbjni](https://github.com/facebook/rocksdb/tree/master/java) | yes | [yes](https://mvnrepository.com/artifact/org.rocksdb/rocksdbjni) | 5.0.1 (7.4.3+ recommended)
99
com.github.jnr | [jffi](https://github.com/jnr/jffi) | yes | [yes](https://mvnrepository.com/artifact/com.github.jnr/jffi) | 1.2.13
1010
org.apache.commons | [commons-crypto](https://github.com/apache/commons-crypto) | yes | [yes](https://search.maven.org/artifact/org.apache.commons/commons-crypto/1.1.0/jar) | 1.1.0
1111
io.netty | [netty-transport-native-epoll](https://github.com/netty/netty) | yes | [yes](https://mvnrepository.com/artifact/io.netty/netty-transport-native-epoll) | 4.1.50

‎Monitoring_Tools_on_Graviton.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ One can collect hardware events/ counters for an application, on a specific CPU,
108108
More details on how to use Linux perf utility on AWS Graviton processors is available [here](https://github.com/aws/aws-graviton-getting-started/blob/main/optimizing.md#profiling-the-code).
109109

110110
## Summary: Utilities on AWS Graviton vs. Intel x86 architectures
111-
|Processor |x86 |Graviton2,3 |
111+
|Processor |x86 |Graviton2,3, and 4 |
112112
|--- |--- |--- |
113113
|CPU frequency listing |*lscpu, /proc/cpuinfo, dmidecode* |*dmidecode* |
114114
|*turbostat* support |Yes |No |
@@ -117,12 +117,12 @@ More details on how to use Linux perf utility on AWS Graviton processors is avai
117117
|*i7z* Works |Yes |No |
118118
|*lmbench* |Yes |Yes |
119119
|Intel *MLC* |Yes |No |
120-
|Performance monitoring tools |_[VTune Profiler](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html) and [PCM](https://github.com/opcm/pcm), [Linux perf](https://www.brendangregg.com/perf.html)_ |_[Linux perf](https://www.brendangregg.com/perf.html), [Arm Forge](https://developer.arm.com/Tools%20and%20Software/Arm%20Forge)_ |
120+
|Performance monitoring tools |_[VTune Profiler](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html), [PCM](https://github.com/opcm/pcm), [Linux perf](https://www.brendangregg.com/perf.html), [APerf](https://github.com/aws/aperf)_ |_[Linux perf](https://www.brendangregg.com/perf.html), [Linaro Forge](https://www.linaroforge.com/), [Arm Streamline CLI Tools](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer), [APerf](https://github.com/aws/aperf)_ |
121121

122122
Utilities such as *lmbench* are available [here](http://lmbench.sourceforge.net/) and can be built for AWS Graviton processors to obtain latency and bandwidth stats.
123123

124124
**Notes**:
125125

126126
**1.** The ARM Linux kernel community has decided not to put CPU frequency in _/proc/cpuinfo_ which can be read by tools such as _lscpu_ or directly.
127127

128-
**2.** On AWS Graviton 2/3 processors, Turbo isn’t supported. So, utilities such as ‘turbostat’ aren’t supported/ relevant for Arm architecture (and not on AWS Graviton processor either). Also, tools such as *[i7z](https://code.google.com/archive/p/i7z/)* for discovering CPU frequency, turbo, sockets and other information are only supported on Intel architecture/ processors. Intel *MLC* is a memory latency checker utility that is only supported on Intel processors.
128+
**2.** On AWS Graviton processors, Turbo isn’t supported. So, utilities such as ‘turbostat’ aren’t supported/ relevant. Also, tools such as *[i7z](https://code.google.com/archive/p/i7z/)* for discovering CPU frequency, turbo, sockets and other information are only supported on Intel architecture/ processors. Intel *MLC* is a memory latency checker utility that is only supported on Intel processors.

‎c-c++.md

+17-14
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22

33
### Enabling Arm Architecture Specific Features
44

5-
TLDR: To target all current generation Graviton instances (Graviton2 and
6-
Graviton3), use `-march=arm8.2-a`.
5+
TLDR: To target all current generation Graviton instances (Graviton2,
6+
Graviton3, and Graviton4), use `-march=arm8.2-a`.
77

88
C and C++ code can be built for Graviton with a variety of flags, depending on
99
the goal. If the goal is to get the best performance for a specific generation,
@@ -21,6 +21,7 @@ CPU | Flag (performance) | Flag (balanced) | GCC version
2121
-------------|-----------------------|---------------------------|------------------|---------------
2222
Graviton2 | `-mcpu=neoverse-n1` ¹ | `-march=armv8.2-a` | GCC-9 | Clang/LLVM 10+
2323
Graviton3(E) | `-mcpu=neoverse-v1` | `-mcpu=neoverse-512tvb` ² | GCC 11 | Clang/LLVM 14+
24+
Graviton4 | `-mcpu=neoverse-v2` | `-mcpu=neoverse-512tvb` ² | GCC 13 | Clang/LLVM 16+
2425

2526
¹ Requires GCC-9 or later (or GCC-7 for Amazon Linux 2); otherwise we suggest
2627
using `-mcpu=cortex-a72`
@@ -48,7 +49,8 @@ Distribution | GCC | Clang/LLVM
4849
----------------|----------------------|-------------
4950
Amazon Linux 2023 | 11* | 15*
5051
Amazon Linux 2 | 7*, 10 | 7, 11*
51-
Ubuntu 22.04 | 9, 10, 11*, 12 | 11, 12, 13, 14*
52+
Ubuntu 24.04 | 9, 10, 11, 12, 13*, 14 | 14, 15, 16, 17, 18*
53+
Ubuntu 22.04 | 9, 10, 11*, 12 | 11, 12, 13, 14*
5254
Ubuntu 20.04 | 7, 8, 9*, 10 | 6, 7, 8, 9, 10, 11, 12
5355
Ubuntu 18.04 | 4.8, 5, 6, 7*, 8 | 3.9, 4, 5, 6, 7, 8, 9, 10
5456
Debian10 | 7, 8* | 6, 7, 8
@@ -58,7 +60,7 @@ SUSE Linux ES15 | 7*, 9, 10 | 7
5860

5961
### Large-System Extensions (LSE)
6062

61-
The Graviton2 and Graviton3(E) processors have support for the Large-System Extensions (LSE)
63+
All Graviton processors after Graviton1 have support for the Large-System Extensions (LSE)
6264
which was first introduced in vArmv8.1. LSE provides low-cost atomic operations which can
6365
improve system throughput for CPU-to-CPU communication, locks, and mutexes.
6466
The improvement can be up to an order of magnitude when using LSE instead of
@@ -67,11 +69,12 @@ load/store exclusives.
6769
POSIX threads library needs LSE atomic instructions. LSE is important for
6870
locking and thread synchronization routines. The following systems distribute
6971
a libc compiled with LSE instructions:
70-
- Amazon Linux 2,
71-
- Amazon Linux 2022,
72-
- Ubuntu 18.04 (needs `apt install libc6-lse`),
73-
- Ubuntu 20.04,
74-
- Ubuntu 22.04.
72+
- Amazon Linux 2
73+
- Amazon Linux 2023
74+
- Ubuntu 18.04 (needs `apt install libc6-lse`)
75+
- Ubuntu 20.04
76+
- Ubuntu 22.04
77+
- Ubuntu 24.04
7578

7679
The compiler needs to generate LSE instructions for applications that use atomic
7780
operations. For example, the code of databases like PostgreSQL contain atomic
@@ -87,8 +90,8 @@ To check whether the application binary contains load and store exclusives:
8790
$ objdump -d app | grep -i 'ldxr\|ldaxr\|stxr\|stlxr' | wc -l
8891
```
8992

90-
GCC's `-moutline-atomics` flag produces a binary that runs on both Graviton and
91-
Graviton2. Supporting both platforms with the same binary comes at a small
93+
GCC's `-moutline-atomics` flag produces a binary that runs on both Graviton1 and later
94+
Gravitons with LSE support. Supporting both platforms with the same binary comes at a small
9295
extra cost: one load and one branch. To check that an application
9396
has been compiled with `-moutline-atomics`, `nm` command line utility displays
9497
the name of functions and global variables in an application binary. The boolean
@@ -152,16 +155,16 @@ if (feof(stdin)) {
152155
}
153156
```
154157

155-
### Using Graviton2 Arm instructions to speed-up Machine Learning
158+
### Using Arm instructions to speed-up Machine Learning
156159

157-
Graviton2 processors been optimized for performance and power efficient machine learning by enabling [Arm dot-product instructions](https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/exploring-the-arm-dot-product-instructions) commonly used for Machine Learning (quantized) inference workloads, and enabling [Half precision floating point - \_float16](https://developer.arm.com/documentation/100067/0612/Other-Compiler-specific-Features/Half-precision-floating-point-intrinsics) to double the number of operations per second, reducing the memory footprint compared to single precision floating point (\_float32), while still enjoying large dynamic range.
160+
Graviton2 and later processors been optimized for performance and power efficient machine learning by enabling [Arm dot-product instructions](https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/exploring-the-arm-dot-product-instructions) commonly used for Machine Learning (quantized) inference workloads, and enabling [Half precision floating point - \_float16](https://developer.arm.com/documentation/100067/0612/Other-Compiler-specific-Features/Half-precision-floating-point-intrinsics) to double the number of operations per second, reducing the memory footprint compared to single precision floating point (\_float32), while still enjoying large dynamic range.
158161

159162
### Using SVE
160163

161164
The scalable vector extensions (SVE) require both a new enough tool-chain to
162165
auto-vectorize to SVE (GCC 11+, LLVM 14+) and a 4.15+ kernel that supports SVE.
163166
One notable exception is that Amazon Linux 2 with a 4.14 kernel doesn't support SVE;
164-
please upgrade to a 5.4+ AL2 kernel.
167+
please upgrade to a 5.4+ AL2 kernel. Graviton3 and Graviton4 support SVE, earlier Gravitons does not.
165168

166169
### Using Arm instructions to speed-up common code sequences
167170
The Arm instruction set includes instructions that can be used to speedup common

‎dpdk_spdk.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
1-
# DPDK, SPDK, ISA-L supports Graviton2
1+
# DPDK, SPDK, ISA-L supports Graviton
22

3-
Graviton2 is optimized for data path functions like networking and storage. Users of [DPDK](https://github.com/dpdk/dpdk) and [SPDK](https://github.com/spdk/spdk) can download and compile natively on Graviton2 following the normal installation guidelines from the respective repositories linked above.
3+
Graviton2 and later CPUs are optimized for data path functions like networking and storage. Users of [DPDK](https://github.com/dpdk/dpdk) and [SPDK](https://github.com/spdk/spdk) can download and compile natively on Graviton following the normal installation guidelines from the respective repositories linked above.
44

55
**NOTE**: *Though DPDK precompiled packages are available from Ubuntu but we recommend building them from source.*
66

7-
SPDK relies often on [ISA-L](https://github.com/intel/isa-l) which is already optimized for Arm64 and the CPU cores in Graviton2.
7+
SPDK relies often on [ISA-L](https://github.com/intel/isa-l) which is already optimized for Arm64 and the CPU cores in Graviton2 and later processors.
88

99

1010

1111
## Compile DPDK from source
1212

1313
[DPDK official guidelines](https://doc.dpdk.org/guides/linux_gsg/build_dpdk.html) requires using *meson* and *ninja* to build from source code.
1414

15-
A native compilation of DPDK on top of Graviton2 will generate optimized code that take advantage of the CRC and Crypto instructions in Graviton2 cpu cores.
15+
A native compilation of DPDK on top of Graviton will generate optimized code that take advantage of the CRC and Crypto instructions in Graviton2 and later cpu cores.
1616

1717
**NOTE**: Some of the installations steps call "python" which may not be valid command in modern linux distribution, you may need to install *python-is-python3* to resolve this.
1818

@@ -35,5 +35,5 @@ Some application, written with the x86 architecture in mind, set the active dpdk
3535

3636
## Known issues
3737

38-
* **testpmd:** The flowgen function of testpmd does not work correctly when compiled with GCC 9 and above. It generates IP packets with wrong checksum which are dropped when transmitted between AWS instances (including Graviton2). This is a known issue and there is a [patch](https://patches.dpdk.org/patch/84772/) that fixes it.
38+
* **testpmd:** The flowgen function of testpmd does not work correctly when compiled with GCC 9 and above. It generates IP packets with wrong checksum which are dropped when transmitted between AWS instances (including Graviton). This is a known issue and there is a [patch](https://patches.dpdk.org/patch/84772/) that fixes it.
3939

‎managed_services.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Note: You can always find the latest Graviton announcements via these [What's Ne
55
Service | Status | Resources |
66
:-: | :-: | --- |
77
[AWS App Mesh](https://aws.amazon.com/app-mesh/) | GA | What's New: [AWS App Mesh now supports ARM64-based Envoy Images](https://aws.amazon.com/about-aws/whats-new/2021/11/aws-app-mesh-arm64-envoy-images/) |
8-
[Amazon Aurora](https://aws.amazon.com/rds/aurora/) | GA | What's New: [Achieve up to 35% better price/performance with Amazon Aurora using new Graviton2 instances](https://aws.amazon.com/about-aws/whats-new/2021/03/achieve-up-to-35-percent-better-price-performance-with-amazon-aurora-using-new-graviton2-instances/)<br>Related blog: [Key considerations in moving to Graviton2 for Amazon RDS and Amazon Aurora databases](https://aws.amazon.com/blogs/database/key-considerations-in-moving-to-graviton2-for-amazon-rds-and-amazon-aurora-databases/)<br>For supported instance types and database engine versions see [Aurora DB Instances](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.DBInstanceClass.html) |
8+
[Amazon Aurora](https://aws.amazon.com/rds/aurora/) | GA | What's New: [Amazon Aurora MySQL and PostgreSQL support for Graviton3 based R7g instance family](https://aws.amazon.com/about-aws/whats-new/2023/05/amazon-aurora-mysql-postgresql-graviton3-based-r7g-instance-family/), [Achieve up to 35% better price/performance with Amazon Aurora using new Graviton2 instances](https://aws.amazon.com/about-aws/whats-new/2021/03/achieve-up-to-35-percent-better-price-performance-with-amazon-aurora-using-new-graviton2-instances/)<br>Related blog: [Key considerations in moving to Graviton2 for Amazon RDS and Amazon Aurora databases](https://aws.amazon.com/blogs/database/key-considerations-in-moving-to-graviton2-for-amazon-rds-and-amazon-aurora-databases/)<br>For supported instance types and database engine versions see [Aurora DB Instances](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.DBInstanceClass.html) |
99
[Amazon EC2 Auto Scaling](https://aws.amazon.com/ec2/autoscaling/) | GA | What's New: [Amazon EC2 Auto Scaling announces support for multiple launch templates for Auto Scaling groups](https://aws.amazon.com/about-aws/whats-new/2020/11/amazon-ec2-auto-scaling-announces-support-for-multiple-launch-templates-for-auto-scaling-groups/)<br>Associated blog: [Supporting AWS Graviton2 and x86 instance types in the same Auto Scaling group](https://aws.amazon.com/blogs/compute/supporting-aws-graviton2-and-x86-instance-types-in-the-same-auto-scaling-group/)
1010
[AWS Batch](https://aws.amazon.com/batch/) | GA | Blog: [Target cross-platform Go builds with AWS CodeBuild Batch builds](https://aws.amazon.com/blogs/devops/target-cross-platform-go-builds-with-aws-codebuild-batch-builds/) |
1111
[AWS CodeBuild](https://aws.amazon.com/codebuild/) | GA | What's New: [AWS CodeBuild supports Arm-based workloads using AWS Graviton2](https://aws.amazon.com/about-aws/whats-new/2021/02/aws-codebuild-supports-arm-based-workloads-using-aws-graviton2/) |
@@ -15,8 +15,8 @@ Service | Status | Resources |
1515
[Amazon ECS](https://aws.amazon.com/ecs/) | GA | [Amazon ECS-optimized AMIs](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html) |
1616
[Amazon EKS](https://aws.amazon.com/eks/) | GA | What's New: [Amazon EKS support for Arm-based instances powered by AWS Graviton is now generally available](https://aws.amazon.com/about-aws/whats-new/2020/08/amazon-eks-support-for-arm-based-instances-powered-by-aws-graviton-now-generally-available/)<br>Launch Blog: [Amazon EKS on AWS Graviton2 generally available: considerations on multi-architecture apps](https://aws.amazon.com/blogs/containers/eks-on-graviton-generally-available/) |
1717
[AWS Elastic Beanstalk](https://aws.amazon.com/elasticbeanstalk/) | GA | What's New: [Elastic Beanstalk supports AWS Graviton-based Amazon EC2 instance types](https://aws.amazon.com/about-aws/whats-new/2021/11/elastic-beanstalk-aws-graviton-ec2/) |
18-
[Amazon ElastiCache](https://aws.amazon.com/elasticache/) | GA | What's New: [Amazon ElastiCache now supports M6g and R6g Graviton2-based instances](https://aws.amazon.com/about-aws/whats-new/2020/10/amazon-elasticache-now-supports-m6g-and-r6g-graviton2-based-instances/)<br> What's New: [Amazon ElastiCache now supports T4g Graviton2-based instances](https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-elasticache-supports-t4g-graviton2-based-instances/) |
19-
[Amazon EMR](https://aws.amazon.com/emr/) | GA | What's New: [Amazon EMR now provides up to 30% lower cost and up to 15% improved performance for Spark workloads on Graviton2-based instances](https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-emr-now-provides-up-to-30-lower-cost-and-up-to-15-improved-performance/)<br>Launch Blog: [Amazon EMR now provides up to 30% lower cost and up to 15% improved performance for Spark workloads on Graviton2-based instances](https://aws.amazon.com/blogs/big-data/amazon-emr-now-provides-up-to-30-lower-cost-and-up-to-15-improved-performance-for-spark-workloads-on-graviton2-based-instances/) |
18+
[Amazon ElastiCache](https://aws.amazon.com/elasticache/) | GA | What's New: [Amazon ElastiCache now supports M7g and R7g Graviton3-based nodes](https://aws.amazon.com/about-aws/whats-new/2023/08/amazon-elasticache-m7g-r7g-graviton-3-nodes/), [Amazon ElastiCache now supports M6g and R6g Graviton2-based instances](https://aws.amazon.com/about-aws/whats-new/2020/10/amazon-elasticache-now-supports-m6g-and-r6g-graviton2-based-instances/)<br> What's New: [Amazon ElastiCache now supports T4g Graviton2-based instances](https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-elasticache-supports-t4g-graviton2-based-instances/) |
19+
[Amazon EMR](https://aws.amazon.com/emr/) | GA | What's New: [Amazon EMR now supports Amazon EC2 C7g (Graviton3) instances](https://aws.amazon.com/about-aws/whats-new/2023/03/amazon-emr-amazon-ec2-c7g-graviton3-instances/), [Amazon EMR now provides up to 30% lower cost and up to 15% improved performance for Spark workloads on Graviton2-based instances](https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-emr-now-provides-up-to-30-lower-cost-and-up-to-15-improved-performance/)<br>Launch Blog: [Amazon EMR now provides up to 30% lower cost and up to 15% improved performance for Spark workloads on Graviton2-based instances](https://aws.amazon.com/blogs/big-data/amazon-emr-now-provides-up-to-30-lower-cost-and-up-to-15-improved-performance-for-spark-workloads-on-graviton2-based-instances/) |
2020
[Amazon EMR Serverless](https://aws.amazon.com/emr/serverless/) | GA | What's New: [Announcing AWS Graviton2 support for Amazon EMR Serverless - Get up to 35% better price-performance for your serverless Spark and Hive workload](https://aws.amazon.com/about-aws/whats-new/2022/11/aws-graviton2-emr-serverless-35-percent-price-performance-spark-hive-workloads/) |
2121
[AWS Fargate](https://aws.amazon.com/fargate/) | GA | Launch Blog: [Announcing AWS Graviton2 Support for AWS Fargate – Get up to 40% Better Price-Performance for Your Serverless Containers](https://aws.amazon.com/blogs/aws/announcing-aws-graviton2-support-for-aws-fargate-get-up-to-40-better-price-performance-for-your-serverless-containers/) |
2222
[Amazon Gamelift](https://aws.amazon.com/gamelift/) | GA | Launch Blog: [Now available: New Asia Pacific (Osaka) region and Graviton2 support for Amazon GameLift](https://aws.amazon.com/blogs/gametech/now-available-new-asia-pacific-osaka-region-and-graviton2-support-for-amazon-gamelift/)<br>Addition of Graviton3: [Announcing Amazon GameLift support for instances powered by AWS Graviton3 processors](https://aws.amazon.com/about-aws/whats-new/2023/08/amazon-gamelift-instances-aws-graviton-3-processors/)|
@@ -25,5 +25,5 @@ Service | Status | Resources |
2525
[Amazon Managed Streaming for Apache Kafka (MSK)](https://aws.amazon.com/msk/) | GA | What's New: [Amazon MSK now supports Graviton3-based M7g instances for new provisioned clusters](https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-msk-graviton3-m7g-instances-provisioned-clusters/) |
2626
[Amazon Neptune](https://aws.amazon.com/neptune/) | GA | What's New: [Announcing AWS Graviton2-based instances for Amazon Neptune](https://aws.amazon.com/about-aws/whats-new/2021/11/aws-graviton2-based-instances-amazon-neptune/) |
2727
[Amazon OpenSearch Service](https://aws.amazon.com/opensearch-service/) | GA | What's New: [Amazon Elasticsearch Service now offers AWS Graviton2 (M6g, C6g, R6g, and R6gd) instances](https://aws.amazon.com/about-aws/whats-new/2021/05/amazon-elasticsearch-service-offers-aws-graviton2-m6g-c6g-r6g-r6gd-instances/)<br>Related blog: [Increase Amazon Elasticsearch Service performance by upgrading to Graviton2](https://aws.amazon.com/blogs/big-data/increase-amazon-elasticsearch-service-performance-by-upgrading-to-graviton2/)|
28-
[Amazon RDS](https://aws.amazon.com/rds/) | GA | What's New: [Achieve up to 52% better price/performance with Amazon RDS using new Graviton2 instances](https://aws.amazon.com/about-aws/whats-new/2020/10/achieve-up-to-52-percent-better-price-performance-with-amazon-rds-using-new-graviton2-instances/)<br>Launch Blog: [New – Amazon RDS on Graviton2 Processors](https://aws.amazon.com/blogs/aws/new-amazon-rds-on-graviton2-processors/)<br>Related blog: [Key considerations in moving to Graviton2 for Amazon RDS and Amazon Aurora databases](https://aws.amazon.com/blogs/database/key-considerations-in-moving-to-graviton2-for-amazon-rds-and-amazon-aurora-databases/)<br>For supported instance types and database engine versions see [RDS DB Instances](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html) |
29-
[Amazon SageMaker](https://aws.amazon.com/pm/sagemaker/) | GA | What's New: [Amazon SageMaker adds eight new Graviton-based instances for model deployment](https://aws.amazon.com/about-aws/whats-new/2022/10/amazon-sagemaker-adds-new-graviton-based-instances-model-deployment/) <br> Related blog: [Run machine learning inference workloads on AWS Graviton-based instances with Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/run-machine-learning-inference-workloads-on-aws-graviton-based-instances-with-amazon-sagemaker/)|
28+
[Amazon RDS](https://aws.amazon.com/rds/) | GA | What's New: [Amazon RDS now supports M7g and R7g database instances](https://aws.amazon.com/about-aws/whats-new/2023/04/amazon-rds-m7g-r7g-database-instances/), [Achieve up to 52% better price/performance with Amazon RDS using new Graviton2 instances](https://aws.amazon.com/about-aws/whats-new/2020/10/achieve-up-to-52-percent-better-price-performance-with-amazon-rds-using-new-graviton2-instances/)<br>Launch Blog: [New – Amazon RDS on Graviton2 Processors](https://aws.amazon.com/blogs/aws/new-amazon-rds-on-graviton2-processors/)<br>Related blog: [Key considerations in moving to Graviton2 for Amazon RDS and Amazon Aurora databases](https://aws.amazon.com/blogs/database/key-considerations-in-moving-to-graviton2-for-amazon-rds-and-amazon-aurora-databases/)<br>For supported instance types and database engine versions see [RDS DB Instances](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html) |
29+
[Amazon SageMaker](https://aws.amazon.com/pm/sagemaker/) | GA | What's New: [Amazon SageMaker adds eight new Graviton-based instances for model deployment](https://aws.amazon.com/about-aws/whats-new/2022/10/amazon-sagemaker-adds-new-graviton-based-instances-model-deployment/) <br> Related blog: [Run machine learning inference workloads on AWS Graviton-based instances with Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/run-machine-learning-inference-workloads-on-aws-graviton-based-instances-with-amazon-sagemaker/), [Reduce Amazon SageMaker inference cost with AWS Graviton](https://aws.amazon.com/blogs/machine-learning/reduce-amazon-sagemaker-inference-cost-with-aws-graviton/)|

‎optimizing.md

+24-7
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ it relies on undefined behavior in the language (e.g. assuming char is signed in
88
or the behavior of signed integer overflow), contains memory management bugs that
99
happen to be exposed by aggressive compiler optimizations, or incorrect ordering.
1010
Below are some techniques / tools we have used to find issues
11-
while migrating our internal services to newer compilers and Graviton2.
11+
while migrating our internal services to newer compilers and Graviton based instances.
1212

1313
### Using Sanitizers
1414
The compiler may generate code and layout data slightly differently on Graviton
@@ -29,7 +29,7 @@ information.
2929
Arm is weakly ordered, similar to POWER and other modern architectures. While
3030
x86 is a variant of total-store-ordering (TSO).
3131
Code that relies on TSO may lack barriers to properly order memory references.
32-
Armv8 based systems, including Graviton and Graviton2 are [weakly ordered
32+
Armv8 based systems, including all Gravitons are [weakly ordered
3333
multi-copy-atomic](https://www.cl.cam.ac.uk/~pes20/armv8-mca/armv8-mca-draft.pdf).
3434

3535
While TSO allows reads to occur out-of-order with writes and a processor to
@@ -54,16 +54,16 @@ is corresponding Arm code there too. If not, that might be something to improve.
5454
We welcome suggestions by opening an issue in this repo.
5555

5656
### Lock/Synchronization intensive workload
57-
Graviton2 supports the Arm Large Scale Extensions (LSE). LSE based locking and synchronization
58-
is an order of magnitude faster for highly contended locks with high core counts (e.g. 64 with Graviton2).
57+
Graviton2 processors and later support the Arm Large Scale Extensions (LSE). LSE based locking and synchronization
58+
is an order of magnitude faster for highly contended locks with high core counts (e.g. up to 192 cores on Graviton4).
5959
For workloads that have highly contended locks, compiling with `-march=armv8.2-a` will enable LSE based atomics and can substantially increase performance. However, this will prevent the code
6060
from running on an Arm v8.0 system such as AWS Graviton-based EC2 A1 instances.
6161
With GCC 10 and newer an option `-moutline-atomics` will not inline atomics and
6262
detect at run time the correct type of atomic to use. This is slightly worse
6363
performing than `-march=armv8.2-a` but does retain backwards compatibility.
6464

6565
### Network intensive workloads
66-
In some workloads, the packet processing capability of Graviton2 is both faster and
66+
In some workloads, the packet processing capability of Graviton is both faster and
6767
lower-latency than other platforms, which reduces the natural “coalescing”
6868
capability of Linux kernel and increases the interrupt rate.
6969
Depending on the workload it might make sense to enable adaptive RX interrupts
@@ -72,12 +72,29 @@ Depending on the workload it might make sense to enable adaptive RX interrupts
7272
## Profiling the code
7373
If you aren't getting the performance you expect, one of the best ways to understand what is
7474
going on in the system is to compare profiles of execution and understand where the CPUs are
75-
spending time. This will frequently point to a hot function that could be optimized. A crutch
75+
spending time. This will frequently point to a hot function or sub-system that could be optimized. A crutch
7676
is comparing a profile between a system that is performing well and one that isn't to see the
7777
relative difference in execution time. Feel free to open an issue in this
7878
GitHub repo for advice or help.
7979

80-
Install the Linux perf tool:
80+
Using [AWS APerf](https://github.com/aws/aperf) tool:
81+
```bash
82+
# Graviton
83+
wget -qO- https://github.com/aws/aperf/releases/download/v0.1.10-alpha/aperf-v0.1.10-alpha-aarch64.tar.gz | tar -xvz -C /target/directory
84+
85+
# x86
86+
wget -qO- https://github.com/aws/aperf/releases/download/v0.1.10-alpha/aperf-v0.1.10-alpha-x86_64.tar.gz | tar -xvz -C /target/directory
87+
88+
## Record a profile and generate a report
89+
cd /target/directory/
90+
./aperf record -r <RUN_NAME> -i <INTERVAL_NUMBER> -p <COLLECTION_PERIOD>
91+
./aperf report -r <COLLECTOR_DIRECTORY> -n <REPORT_NAME>
92+
93+
## The resulting report can be viewed with a web-browser by opening the index.html file
94+
```
95+
96+
97+
Using the Linux perf tool:
8198
```bash
8299
# Amazon Linux 2
83100
sudo yum install perf

‎os.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,12 @@ Ubuntu | 20.04 LTS | Yes | 4KB | [focal](https://cloud-images.ubuntu.com/locator
1111
Ubuntu | 18.04 LTS | Yes (*) | 4KB | [bionic](https://cloud-images.ubuntu.com/locator/ec2/) | Yes | (*) needs `apt install libc6-lse`. Free support ended 2023/05/31.
1212
SuSE | 15 SP2 or later| Planned | 4KB | [MarketPlace](https://aws.amazon.com/marketplace/pp/B07SPTXBDX) | Yes |
1313
Redhat Enterprise Linux | 8.2 or later | Yes | 64KB | [MarketPlace](https://aws.amazon.com/marketplace/pp/B07T2NH46P) | Yes |
14-
~~Redhat Enterprise Linux~~ | ~~7.x~~ | ~~No~~ | ~~64KB~~ | ~~[MarketPlace](https://aws.amazon.com/marketplace/pp/B07KTFV2S8)~~ | | Supported on A1 instances but not on Graviton2 based ones
14+
~~Redhat Enterprise Linux~~ | ~~7.x~~ | ~~No~~ | ~~64KB~~ | ~~[MarketPlace](https://aws.amazon.com/marketplace/pp/B07KTFV2S8)~~ | | Supported on A1 instances but not on Graviton2 and later based ones
1515
AlmaLinux | 8.4 or later | Yes | 64KB | [AMIs](https://wiki.almalinux.org/cloud/AWS.html) | Yes |
1616
Alpine Linux | 3.12.7 or later | Yes (*) | 4KB | [AMIs](https://www.alpinelinux.org/cloud/) | | (*) LSE enablement checked in version 3.14 |
1717
CentOS | 8.2.2004 or later | No | 64KB | [AMIs](https://wiki.centos.org/Cloud/AWS#Images) | Yes | |
1818
CentOS Stream | 8 | No (*) | 64KB (*) | [Downloads](https://www.centos.org/centos-stream/) | |(*) details to be confirmed once AMI's are available|
19-
~~CentOS~~ | ~~7.x~~ | ~~No~~ | ~~64KB~~ | ~~[AMIs](https://wiki.centos.org/Cloud/AWS#Images)~~ | | Supported on A1 instances but not on Graviton2 based ones
19+
~~CentOS~~ | ~~7.x~~ | ~~No~~ | ~~64KB~~ | ~~[AMIs](https://wiki.centos.org/Cloud/AWS#Images)~~ | | Supported on A1 instances but not on Graviton2 and later based ones
2020
Debian | 11 | Yes | 4KB | [Community](https://wiki.debian.org/Cloud/AmazonEC2Image/Bullseye) or [MarketPlace](https://aws.amazon.com/marketplace/pp/prodview-jwzxq55gno4p4) | Yes |
2121
Debian | 10 | [Planned for Debian 11](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956418) | 4KB | [Community](https://wiki.debian.org/Cloud/AmazonEC2Image/Buster) or [MarketPlace](https://aws.amazon.com/marketplace/pp/B085HGTX5J) | Yes, as of Debian 10.7 (2020-12-07) |
2222
FreeBSD | 12.1 or later | No | 4KB | [Community](https://www.freebsd.org/releases/12.1R/announce.html) or [MarketPlace](https://aws.amazon.com/marketplace/pp/B081NF7BY7) | No | Device hotplug and API shutdown don't work

‎perfrunbook/README.md

+8-3
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ This document is a reference for software developers who want to benchmark, debu
66

77
This document covers many topics including how to benchmark, how to debug performance and which optimization recommendations. It is not meant to be read beginning-to-end. Instead view it as a collection of checklists and best known practices to apply when working with Graviton instances that go progressively deeper into analyzing the system. Please see the FAQ below to direct you towards the most relevant set of checklists and tools depending on your specific situation.
88

9-
If after following these guides there is still an issue you cannot resolve with regards to performance on Graviton2, please do not hesitate to raise an issue on the [AWS-Graviton-Getting-Started](https://github.com/aws/aws-graviton-getting-started/issues) guide or contact us at [ec2-arm-dev-feedback@amazon.com](mailto:ec2-arm-dev-feedback@amazon.com). If there is something missing in this guide, please raise an issue or better, post a pull-request.
9+
If after following these guides there is still an issue you cannot resolve with regards to performance on Graviton based instances, please do not hesitate to raise an issue on the [AWS-Graviton-Getting-Started](https://github.com/aws/aws-graviton-getting-started/issues) guide or contact us at [ec2-arm-dev-feedback@amazon.com](mailto:ec2-arm-dev-feedback@amazon.com). If there is something missing in this guide, please raise an issue or better, post a pull-request.
1010

1111
## Pre-requisites
1212

13-
To assist with some of the tasks listed in this runbook, we have created some helper-scripts for some of the tasks the checklists describe. The helper-scripts assume the test instances are running an up-to-date AL2 or Ubuntu 20.04LTS distribution and the user can run the scripts using `sudo`. Follow the steps below to obtain and install the utilities on your test systems:
13+
To assist with some of the tasks listed in this runbook, we have created some helper-scripts for some of the tasks the checklists describe. The helper-scripts assume the test instances are running an up-to-date AL2, AL2023 or Ubuntu 20.04LTS/22.04LTS distribution and the user can run the scripts using `sudo`. Follow the steps below to obtain and install the utilities on your test systems:
1414

1515
```bash
1616
# Clone the repository onto your systems-under-test and any load-generation instances
@@ -23,6 +23,11 @@ sudo ./install_perfrunbook_dependencies.sh
2323
# All scripts expect to run from the utilities directory
2424
```
2525

26+
## APerf for performance analysis
27+
28+
There is also a new tool aimed at helping move workloads over to Graviton called [APerf](https://github.com/aws/aperf), it bundles many of the capabilities of the individual tools present in this
29+
runbook and provides a better presentation. It is highly recommended to download this tool and use it to gather most of the same information in one test-run.
30+
2631
## Sections
2732

2833
1. [Introduction to Benchmarking](./intro_to_benchmarking.md)
@@ -48,7 +53,7 @@ sudo ./install_perfrunbook_dependencies.sh
4853
* **I benchmarked my service and performance on Graviton is slower compared to my current x86 based fleet, where do I start to root cause why?**
4954
Begin by verifying software dependencies and verifying the configuration of your Graviton and x86 testing environments to check that no major differences are present in the testing environment. Performance differences may be due to differences in environment and not the due to the hardware. Refer to the below chart for a step-by-step flow through this runbook to help root cause the performance regression:
5055
![](./images/performance_debug_flowchart.png)
51-
* **What are the recommended optimizations to try with Graviton2?**
56+
* **What are the recommended optimizations to try with Graviton?**
5257
Refer to [Section 6](./optimization_recommendation.md) for our recommendations on how to make your application run faster on Graviton.
5358
* **I investigated every optimization in this guide and still cannot find the root-cause, what do I do next?**
5459
Please contact us at [ec2-arm-dev-feedback@amazon.com](mailto:ec2-arm-dev-feedback@amazon.com) or talk with your AWS account team representative to get additional help.

‎perfrunbook/appendix.md

+14-8
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,17 @@
44

55
This Appendix contains additional information for engineers that want to go deeper on a particular topic, such as using different PMU counters to understand how the code is executing on the hardware, discussion on load generators, and additional tools to help with code observability.
66

7-
## Useful Graviton2 PMU Counters and ratios
7+
## Useful Graviton PMU Events and ratios
88

9-
The following list of counter ratios has been curated to list counters useful for performance debugging. The more extensive list of counters is contained in the following references:
9+
The following list of counter ratios has been curated to list events useful for performance debugging. The more extensive list of counters is contained in the following references:
1010

1111
* [Arm ARM](https://developer.arm.com/documentation/102105/latest)
1212
* [Neoverse N1 TRM](https://developer.arm.com/documentation/100616/0400/debug-descriptions/performance-monitor-unit/pmu-events)
1313
* [Neoverse N1 PMU Guide](https://developer.arm.com/documentation/PJDOC-466751330-547673/r4p1?lang=en&rev=0)
14+
* [Neoverse V1 TRM](https://developer.arm.com/documentation/101427/latest/)
15+
* [Neoverse V1 PMU Guide](https://developer.arm.com/documentation/109708/latest/)
16+
* [Neoverse V2 TRM](https://developer.arm.com/documentation/102375/latest/)
17+
* [Neoverse V2 PMU Guide](https://developer.arm.com/documentation/109528/0100)
1418

1519
|METRIC |Counter #1 |Counter #2 |Formula |Description |
1620
|--- |--- |--- |--- |--- |
@@ -110,12 +114,13 @@ of such system level resources and if resources are used efficiently.
110114
CMN counters are only accessible on metal-type instances and certain OSes and kernels.
111115

112116

113-
|Distro |Kernel | Graviton2 (c6g) | Graviton3 (c7g) |
114-
|------------|---------|-----------------|-----------------|
115-
|Ubuntu-20.04| 5.15 | yes | no |
116-
|Ubuntu-20.04| >=5.19 | yes | yes |
117-
|Ubuntu-22.04| >=5.19 | yes | yes |
118-
|AL2023 | 6.1.2 | yes | yes |
117+
|Distro |Kernel | Graviton2 (6g) | Graviton3 (7g) | Graviton4 (8g) |
118+
|------------|---------|-----------------|-----------------|----------------|
119+
|Ubuntu-20.04| 5.15 | yes | no | no |
120+
|Ubuntu-20.04| >=5.19 | yes | yes | no |
121+
|Ubuntu-22.04| >=5.19 | yes | yes | no |
122+
|Ubuntu-24.04| >=6.8.0 | yes | yes | yes |
123+
|AL2023 | 6.1.2 | yes | yes | no |
119124

120125

121126
General procedure on Ubuntu
@@ -141,3 +146,4 @@ For further information about specific events and useful ratios, please refer to
141146

142147
[ARM documentation for Graviton3's CMN-650](https://developer.arm.com/documentation/101481/0200/?lang=en)
143148

149+
[ARM documentation for Graviton4's CMN-700](https://developer.arm.com/documentation/102308/latest/)

‎perfrunbook/configuring_your_sut.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ If you have more than one SUT, first verify there are no major differences in se
3535
%> uname -r
3636
4.14.219-161.340.amzn2.x86_64
3737

38-
# Example output on Graviton2 SUT
38+
# Example output on Graviton SUT
3939
%> uname -r
4040
5.10.50-45.132.amzn2.aarch64
4141

@@ -76,9 +76,9 @@ If you have more than one SUT, first verify there are no major differences in se
7676

7777
## Check for missing binary dependencies
7878

79-
Libraries for Python or Java can link in binary shared objects to provide enhanced performance. The absence of these shared object dependencies does not prevent the application from running on Graviton2, but the CPU will be forced to use a slow code-path instead of the optimized paths. Use the checklist below to verify the same shared objects are available on all platforms.
79+
Libraries for Python or Java can link in binary shared objects to provide enhanced performance. The absence of these shared object dependencies does not prevent the application from running on Graviton, but the CPU will be forced to use a slow code-path instead of the optimized paths. Use the checklist below to verify the same shared objects are available on all platforms.
8080

81-
1. JVM based languages — Check for the presence of binary shared objects in the installed JARs and compare between Graviton2 and x86.
81+
1. JVM based languages — Check for the presence of binary shared objects in the installed JARs and compare between Graviton and x86.
8282
```bash
8383
%> cd ~/aws-getting-started-guide/perfrunbook/utilities
8484
%> sudo ./find_and_list_jar_with_so.sh
@@ -100,12 +100,12 @@ Libraries for Python or Java can link in binary shared objects to provide enhanc
100100
./META-INF/native/linux64/libjansi.so
101101
./META-INF/native/linux32/libjansi.so
102102
```
103-
2. Python — Check for the presence of binary shared objects in your python version’s `site-packages` locations and compare between Graviton2 and x86:
103+
2. Python — Check for the presence of binary shared objects in your python version’s `site-packages` locations and compare between Graviton and x86:
104104
```bash
105105
%> cd ~/aws-getting-started-guide/perfrunbook/utilites
106106
%> sudo ./find_and_list_pylib_with_so.sh 3.7 # takes python version as arg
107107
# Example output ...
108-
# ... Graviton2
108+
# ... Graviton
109109
./numpy/core/_multiarray_tests.cpython-37m-aarch64-linux-gnu.so
110110
./numpy/core/_struct_ufunc_tests.cpython-37m-aarch64-linux-gnu.so
111111
./numpy/core/_rational_tests.cpython-37m-aarch64-linux-gnu.so
@@ -130,14 +130,14 @@ Libraries for Python or Java can link in binary shared objects to provide enhanc
130130

131131
## Check native application build system and code
132132

133-
For native compiled components of your application, proper compile flags are essential to make sure Graviton2’s hardware features are being fully taken advantage of. Follow the below checklist:
133+
For native compiled components of your application, proper compile flags are essential to make sure Graviton’s hardware features are being fully taken advantage of. Follow the below checklist:
134134

135135
1. Verify equivalent code optimizations are being made for Graviton as well as x86. For example with C/C++ code built with GCC, make sure if builds use `-O3` for x86, that Graviton builds also use that optimization and not some basic debug setting like just `-g`.
136136
2. Confirm when building for Graviton that **one of the following flags** are added to the compile line for GCC/LLVM12+ to ensure using Large System Extension instructions when able to speed up atomic operations.
137-
1. Use `-moutline-atomics` for code that must run on Graviton1 and Graviton2
138-
2. Use `-march=armv8.2a -mcpu=neoverse-n1` for code that will run on Graviton2 and other modern Arm platforms
137+
1. Use `-moutline-atomics` for code that must run on all Graviton platforms
138+
2. Use `-march=armv8.2a -mcpu=neoverse-n1` for code that will run on Graviton2 or later and other modern Arm platforms
139139
3. When building natively for Rust, ensure that `RUSTFLAGS` is set to **one of the following flags**
140-
1. `export RUSTFLAGS="-Ctarget-features=+lse"` for code that will run on Graviton2 and earlier platforms that support LSE (Large System Extension) instructions.
140+
1. `export RUSTFLAGS="-Ctarget-features=+lse"` for code that will run on all Graviton2 and other Arm platforms that support LSE (Large System Extension) instructions.
141141
2. `export RUSTFLAGS="-Ctarget-cpu=neoverse-n1"` for code that will only run on Graviton2 and later platforms.
142142
4. Check for the existence of assembly optimized on x86 with no optimization on Graviton. For help with porting optimized assembly routines, see [Section 6](./optimization_recommendation.md).
143143
```bash

‎perfrunbook/debug_code_perf.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ You may see a small single-digit percent increase in overhead with pseudo-NMI en
5353

5454
## Off-cpu profiling
5555

56-
If Graviton2 is consuming less CPU-time than expected, it is useful to find call-stacks that are putting *threads* to sleep via the OS. Lock contention, IO Bottlenecks, OS scheduler issues can all lead to cases where performance is lower, but the CPU is not being fully utilized. The method to look for what might be causing more off-cpu time is the same as with looking for functions consuming more on-cpu time: generate a flamegraph and compare. In this case, the differences are more subtle to look for as small differences can mean large swings in performance as more thread sleeps can induce milli-seconds of wasted execution time.
56+
If Graviton is consuming less CPU-time than expected, it is useful to find call-stacks that are putting *threads* to sleep via the OS. Lock contention, IO Bottlenecks, OS scheduler issues can all lead to cases where performance is lower, but the CPU is not being fully utilized. The method to look for what might be causing more off-cpu time is the same as with looking for functions consuming more on-cpu time: generate a flamegraph and compare. In this case, the differences are more subtle to look for as small differences can mean large swings in performance as more thread sleeps can induce milli-seconds of wasted execution time.
5757

5858
1. Verify native (i.e. C/C++/Rust) code is built with `-fno-omit-frame-``pointer`
5959
2. Verify java code is started with `-XX:+PreserveFramePointer -agentpath:/path/to/libperf-jvmti.so`
@@ -109,4 +109,3 @@ In our `capture_flamegraphs.sh` helper script, we use `perf record` to gather tr
109109
1. Use `-e instructions` to generate a flame-graph of the functions that use the most instructions on average to identify a compiler or code optimization opportunity.
110110
2. Use `-e cache-misses` to generate a flame-graph of functions that miss the L1 cache the most to indicate if changing to a more efficient data-structure might be necessary.
111111
3. Use `-e branch-misses` to generate a flame-graph of functions that cause the CPU to mis-speculate. This may identify regions with heavy use of conditionals, or conditionals that are data-dependent and may be a candidate for refactoring.
112-

‎perfrunbook/debug_hw_perf.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ There are hundreds of events available to monitor in a server CPU today which is
1616

1717
## How to Collect PMU counters
1818

19-
A limited subset of PMU events for the CPU are available on Graviton \*6g, \*7g sizes <16xl, we recommend using a 16xl for experiments needing PMU events to get access to all of them. On 5th and 6th generation x86 instances use a single socket instance is needed to have access to the CPU PMU events: >c5.9xl, >\*5.12xl, >\*6i.16xl, >c5a.12xl, and >\*6a.24xl. On 7th generation x86 instances *7a and *7i, all sizes get access to a limited number of CPU PMU events, just like on Graviton instances, and full socket or larger instances (>\*7\*.24xl) get access to all PMU events.
19+
A limited subset of PMU events for the CPU are available on Graviton \*6g, \*7g sizes <16xl, we recommend using a 16xl for experiments needing PMU events to get access to all of them. On Graviton \*8g, sizes >24xl have access to all the CPU PMU events. On 5th and 6th generation x86 instances use a single socket instance is needed to have access to the CPU PMU events: >c5.9xl, >\*5.12xl, >\*6i.16xl, >c5a.12xl, and >\*6a.24xl. On 7th generation x86 instances *7a and *7i, all sizes get access to a limited number of CPU PMU events, just like on Graviton instances, and full socket or larger instances (>\*7\*.24xl) get access to all PMU events.
2020

2121
To measure the standard CPU PMU events, do the following:
2222

@@ -121,7 +121,7 @@ To measure the standard CPU PMU events, do the following:
121121

122122
This checklist describes the top-down method to debug whether the hardware is under-performing and what part is underperforming. The checklist describes event ratios to check that are included in the helper-script. All ratios are in terms of either misses-per-1000(kilo)-instruction or per-1000(kilo)-cycles. This checklist aims to help guide whether a hardware slow down is coming from the front-end of the processor or the backend of the processor and then what particular part. The front-end of the processor is responsible for fetching and supplying the instructions. The back-end is responsible for executing the instructions provided by the front-end as fast as possible. A bottleneck in either part will cause stalls and a decrease in performance. After determining where the bottleneck may lie, you can proceed to [Section 6](./optimization_recommendation.md) to read suggested optimizations to mitigate the problem.
123123

124-
1. Start by measuring `ipc` (Instructions per cycle) on each instance-type. A higher IPC is better. A lower number for `ipc` on Graviton2 compared to x86 indicates *that* there is a performance problem. At this point, proceed to attempt to root cause where the lower IPC bottleneck is coming from by collecting frontend and backend stall metrics.
124+
1. Start by measuring `ipc` (Instructions per cycle) on each instance-type. A higher IPC is better. A lower number for `ipc` on Graviton compared to x86 indicates *that* there is a performance problem. At this point, proceed to attempt to root cause where the lower IPC bottleneck is coming from by collecting frontend and backend stall metrics.
125125
2. Next, measure `stall_frontend_pkc` and `stall_backend_pkc` (pkc = per kilo cycle) and determine which is higher. If stalls in the frontend are higher, it indicates the part of the CPU responsible for predicting and fetching the next instructions to execute is causing slow-downs. If stalls in the backend are higher, it indicates the machinery that executes the instructions and reads data from memory is causing slow-downs
126126

127127
### Drill down front end stalls

‎perfrunbook/debug_system_perf.md

+4-5
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ It is also advisable to check memory consumption using `sysstat -r ALL` or `htop
8080
%> cd ~/aws-gravition-getting-started/perfrunbook/utilities
8181
%> python3 ./measure_and_plot_basic_sysstat_stats.py --stat new-connections --time 60
8282
```
83-
2. If seeing bursts, verify this is expected behavior for your load generator. Bursts can cause performance degradation on Graviton2 if each new connection has to do an RSA signing operation for TLS connection establishment.
83+
2. If seeing bursts, verify this is expected behavior for your load generator. Bursts can cause performance degradation for each new connection, especially if it has to do an RSA signing operation for TLS connection establishment.
8484
3. Check on SUT for hot connections (connections that are more heavily used than others) by running: `watch netstat -t`
8585
4. The example below shows the use of `netstat -t` to watch TCP connections with one being hot as indicated by its non-zero `Send-Q` value while all other connections have a value of 0. This can lead to one core being saturated by network processing on the SUT, bottlenecking the rest of the system.
8686
```bash
@@ -117,11 +117,10 @@ When running Java applications, monitor for differences in behavior using JFR (J
117117
3. The image below shows JMC’s GC pane, showing pause times, heap size and references remaining after each collection.
118118
![](./images/jmc_example_image.png)
119119
4. The same information can be gathered by enabling GC logging and then processing the log output. Enter `-Xlog:gc*,gc+age=trace,gc+ref=debug,gc+ergo=trace` on the Java command line and re-start your application.
120-
5. If longer GC pauses are seen, this could be happening because objects are living longer on Graviton2 and the GC has to scan them. To help debug this gather an off-cpu profile ([see Section 5.b](./debug_code_perf.md)) to look for threads that are sleeping more often and potentially causing heap objects to live longer.
120+
5. If longer GC pauses are seen, this could be happening because objects are living longer on Graviton and the GC has to scan them. To help debug this gather an off-cpu profile ([see Section 5.b](./debug_code_perf.md)) to look for threads that are sleeping more often and potentially causing heap objects to live longer.
121121
6. Check for debug flags that are still enabled but should be disabled, such as: `-XX:-OmitStackTraceInFastThrow` which logs and generates stack traces for all exceptions, even if they are not fatal exceptions.
122-
7. Check there are no major differences in JVM ergonomics between Graviton2 and x86, run:
122+
7. Check there are no major differences in JVM ergonomics between Graviton and x86, run:
123123
```bash
124124
%> java -XX:+PrintFlagsFinal -version
125-
# Capture output from x86 and Graviton2 and then diff the files
125+
# Capture output from x86 and Graviton and then diff the files
126126
```
127-

‎perfrunbook/intro_to_benchmarking.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[Graviton Performance Runbook toplevel](./README.md)
44

5-
When designing an experiment to benchmark Graviton2 against another instance type, it is key to remember the below 2 guiding principles:
5+
When designing an experiment to benchmark Graviton based instances against another instance type, it is key to remember the below 2 guiding principles:
66

77
1. Always define a specific question to answer with your benchmark
88
2. Control your variables and unknowns within the benchmark environment

‎perfrunbook/optimization_recommendation.md

+5-3
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[Graviton Performance Runbook toplevel](./README.md)
44

5-
This section describes multiple different optimization suggestions to try on Graviton2 instances to attain higher performance for your service. Each sub-section defines some optimization recommendations that can help improve performance if you see a particular signature after measuring the performance using the previous checklists.
5+
This section describes multiple different optimization suggestions to try on Graviton based instances to attain higher performance for your service. Each sub-section defines some optimization recommendations that can help improve performance if you see a particular signature after measuring the performance using the previous checklists.
66

77
## Optimizing for large instruction footprint
88

@@ -35,6 +35,8 @@ allocating huge-pages.
3535
2. For additional information on the vector instructions used on Graviton
3636
1. [Arm instrinsics guide](https://developer.arm.com/architectures/instruction-sets/intrinsics/)
3737
2. [Graviton2 core software optimization guide](https://developer.arm.com/documentation/pjdoc466751330-9707/2-0)
38+
3. [Graviton3 core software optimization guide](https://developer.arm.com/documentation/pjdoc466751330-9685/latest/)
39+
4. [Graviton4 core software optimization guide](https://developer.arm.com/documentation/PJDOC-466751330-593177/latest/)
3840

3941
## Optimizing synchronization heavy optimizations
4042

@@ -60,12 +62,12 @@ allocating huge-pages.
6062
done
6163
```
6264
3. Disable Receive Packet Steering (RPS) to avoid contention and extra IPIs.
63-
1. `cat /sys/class/net/ethN/queues/rx-N/rps_cpus` and verify they are set to `0`. In general RPS is not needed on Graviton2.
65+
1. `cat /sys/class/net/ethN/queues/rx-N/rps_cpus` and verify they are set to `0`. In general RPS is not needed on Graviton2 and newer.
6466
2. You can try using RPS if your situation is unique. Read the [documentation on RPS](https://www.kernel.org/doc/Documentation/networking/scaling.txt) to understand further how it might help. Also refer to [Optimizing network intensive workloads on Amazon EC2 A1 Instances](https://aws.amazon.com/blogs/compute/optimizing-network-intensive-workloads-on-amazon-ec2-a1-instances/) for concrete examples.
6567

6668
## Metal instance IO optimizations
6769

68-
1. If on Graviton2 metal instances, try disabling the System MMU (Memory Management Unit) to speed up IO handling:
70+
1. If on Graviton2 and newer metal instances, try disabling the System MMU (Memory Management Unit) to speed up IO handling:
6971
```bash
7072
%> cd ~/aws-gravition-getting-started/perfrunbook/utilities
7173
# Configure the SMMU to be off on metal, which is the default on x86.

‎rust.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ cargo build --release
2121
```
2222

2323
If you're running only on Graviton2 or newer hardware you can also enable other
24-
instructions by setting the cpu target as well:
24+
instructions by setting the cpu target such as the example below:
2525

2626
```
2727
export RUSTFLAGS="-Ctarget-cpu=neoverse-n1"

0 commit comments

Comments
 (0)
Please sign in to comment.