aws · Jul 11, 2024
diff --git a/‎CommonNativeJarsTable.md
+1-1 b/‎CommonNativeJarsTable.md
+1-1
diff --git a/‎Monitoring_Tools_on_Graviton.md
+3-3 b/‎Monitoring_Tools_on_Graviton.md
+3-3
diff --git a/‎c-c++.md
+17-14 b/‎c-c++.md
+17-14
diff --git a/‎dpdk_spdk.md
+5-5 b/‎dpdk_spdk.md
+5-5
diff --git a/‎managed_services.md
+5-5 b/‎managed_services.md
+5-5
diff --git a/‎optimizing.md
+24-7 b/‎optimizing.md
+24-7
diff --git a/‎os.md
+2-2 b/‎os.md
+2-2
diff --git a/‎perfrunbook/README.md
+8-3 b/‎perfrunbook/README.md
+8-3
diff --git a/‎perfrunbook/appendix.md
+14-8 b/‎perfrunbook/appendix.md
+14-8
diff --git a/‎perfrunbook/configuring_your_sut.md
+9-9 b/‎perfrunbook/configuring_your_sut.md
+9-9
diff --git a/‎perfrunbook/debug_code_perf.md
+1-2 b/‎perfrunbook/debug_code_perf.md
+1-2
diff --git a/‎perfrunbook/debug_hw_perf.md
+2-2 b/‎perfrunbook/debug_hw_perf.md
+2-2
diff --git a/‎perfrunbook/debug_system_perf.md
+4-5 b/‎perfrunbook/debug_system_perf.md
+4-5
diff --git a/‎perfrunbook/intro_to_benchmarking.md
+1-1 b/‎perfrunbook/intro_to_benchmarking.md
+1-1
diff --git a/‎perfrunbook/optimization_recommendation.md
+5-3 b/‎perfrunbook/optimization_recommendation.md
+5-3
diff --git a/‎rust.md
+1-1 b/‎rust.md
+1-1
@@ -5,7 +5,7 @@ Org  | jar  | Builds on Arm | Arm Artifact available | Minimum Version
 com.github.luben | [zstd-jni](https://github.com/luben/zstd-jni) | yes | [yes](https://mvnrepository.com/artifact/com.github.luben/zstd-jni) | 1.2.0
 org.lz4 | [lz4-java](https://github.com/lz4/lz4-java) | yes | [yes](https://mvnrepository.com/artifact/org.lz4/lz4-java) | 1.4.0
 org.xerial.snappy | [snappy-java](https://github.com/xerial/snappy-java) | yes | [yes](https://mvnrepository.com/artifact/org.xerial.snappy/snappy-java) | 1.1.4
-org.rocksdb | [rocksdbjni](https://github.com/facebook/rocksdb/tree/master/java) | yes | [yes](https://mvnrepository.com/artifact/org.rocksdb/rocksdbjni) | 5.0.1
+org.rocksdb | [rocksdbjni](https://github.com/facebook/rocksdb/tree/master/java) | yes | [yes](https://mvnrepository.com/artifact/org.rocksdb/rocksdbjni) | 5.0.1 (7.4.3+ recommended)
 com.github.jnr | [jffi](https://github.com/jnr/jffi) | yes | [yes](https://mvnrepository.com/artifact/com.github.jnr/jffi) | 1.2.13
 org.apache.commons | [commons-crypto](https://github.com/apache/commons-crypto) | yes | [yes](https://search.maven.org/artifact/org.apache.commons/commons-crypto/1.1.0/jar) | 1.1.0
 io.netty | [netty-transport-native-epoll](https://github.com/netty/netty) | yes | [yes](https://mvnrepository.com/artifact/io.netty/netty-transport-native-epoll) | 4.1.50
 
@@ -108,7 +108,7 @@ One can collect hardware events/ counters for an application, on a specific CPU,
 More details on how to use Linux perf utility on AWS Graviton processors is available [here](https://github.com/aws/aws-graviton-getting-started/blob/main/optimizing.md#profiling-the-code).
 
 ## Summary: Utilities on AWS Graviton vs. Intel x86 architectures
-|Processor	|x86	|Graviton2,3	|
+|Processor	|x86	|Graviton2,3, and 4	|
 |---	|---	|---	|
 |CPU frequency listing	|*lscpu, /proc/cpuinfo, dmidecode*	|*dmidecode*	|
 |*turbostat* support	|Yes	|No	|
@@ -117,12 +117,12 @@ More details on how to use Linux perf utility on AWS Graviton processors is avai
 |*i7z* Works	|Yes	|No	|
 |*lmbench*	|Yes	|Yes	|
 |Intel *MLC*  |Yes    |No     |
-|Performance monitoring tools	|_[VTune Profiler](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html) and [PCM](https://github.com/opcm/pcm), [Linux perf](https://www.brendangregg.com/perf.html)_	|_[Linux perf](https://www.brendangregg.com/perf.html), [Arm Forge](https://developer.arm.com/Tools%20and%20Software/Arm%20Forge)_	|
+|Performance monitoring tools	|_[VTune Profiler](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html), [PCM](https://github.com/opcm/pcm), [Linux perf](https://www.brendangregg.com/perf.html), [APerf](https://github.com/aws/aperf)_	|_[Linux perf](https://www.brendangregg.com/perf.html), [Linaro Forge](https://www.linaroforge.com/), [Arm Streamline CLI Tools](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer), [APerf](https://github.com/aws/aperf)_	|
 
 Utilities such as *lmbench* are available [here](http://lmbench.sourceforge.net/) and can be built for AWS Graviton processors to obtain latency and bandwidth stats.
 
 **Notes**:
 
 **1.** The ARM Linux kernel community has decided not to put CPU frequency in _/proc/cpuinfo_ which can be read by tools such as _lscpu_ or directly.
 
-**2.** On AWS Graviton 2/3 processors, Turbo isn’t supported. So, utilities such as ‘turbostat’ aren’t supported/ relevant for Arm architecture (and not on AWS Graviton processor either). Also, tools such as *[i7z](https://code.google.com/archive/p/i7z/)* for discovering CPU frequency, turbo, sockets and other information are only supported on Intel architecture/ processors. Intel *MLC* is a memory latency checker utility that is only supported on Intel processors.
+**2.** On AWS Graviton processors, Turbo isn’t supported. So, utilities such as ‘turbostat’ aren’t supported/ relevant. Also, tools such as *[i7z](https://code.google.com/archive/p/i7z/)* for discovering CPU frequency, turbo, sockets and other information are only supported on Intel architecture/ processors. Intel *MLC* is a memory latency checker utility that is only supported on Intel processors.
@@ -2,8 +2,8 @@
 
 ### Enabling Arm Architecture Specific Features
 
-TLDR: To target all current generation Graviton instances (Graviton2 and
-Graviton3), use `-march=arm8.2-a`.
+TLDR: To target all current generation Graviton instances (Graviton2,
+Graviton3, and Graviton4), use `-march=arm8.2-a`.
 
 C and C++ code can be built for Graviton with a variety of flags, depending on
 the goal. If the goal is to get the best performance for a specific generation,
@@ -21,6 +21,7 @@ CPU          | Flag (performance)    | Flag (balanced)           | GCC version
 -------------|-----------------------|---------------------------|------------------|---------------
 Graviton2    | `-mcpu=neoverse-n1` ¹ | `-march=armv8.2-a`        | GCC-9            | Clang/LLVM 10+
 Graviton3(E) | `-mcpu=neoverse-v1`   | `-mcpu=neoverse-512tvb` ² | GCC 11           | Clang/LLVM 14+
+Graviton4    | `-mcpu=neoverse-v2`   | `-mcpu=neoverse-512tvb` ² | GCC 13           | Clang/LLVM 16+
 
 ¹ Requires GCC-9 or later (or GCC-7 for Amazon Linux 2); otherwise we suggest
 using `-mcpu=cortex-a72`
@@ -48,7 +49,8 @@ Distribution    | GCC                  | Clang/LLVM
 ----------------|----------------------|-------------
 Amazon Linux 2023  | 11*               | 15*
 Amazon Linux 2  | 7*, 10               | 7, 11*
-Ubuntu 22.04    | 9, 10, 11*, 12         | 11, 12, 13, 14*
+Ubuntu 24.04    | 9, 10, 11, 12, 13*, 14 | 14, 15, 16, 17, 18*
+Ubuntu 22.04    | 9, 10, 11*, 12       | 11, 12, 13, 14*
 Ubuntu 20.04    | 7, 8, 9*, 10         | 6, 7, 8, 9, 10, 11, 12
 Ubuntu 18.04    | 4.8, 5, 6, 7*, 8     | 3.9, 4, 5, 6, 7, 8, 9, 10
 Debian10        | 7, 8*                | 6, 7, 8
@@ -58,7 +60,7 @@ SUSE Linux ES15 | 7*, 9, 10            | 7
 
 ### Large-System Extensions (LSE)
 
-The Graviton2 and Graviton3(E) processors have support for the Large-System Extensions (LSE)
+All Graviton processors after Graviton1 have support for the Large-System Extensions (LSE)
 which was first introduced in vArmv8.1. LSE provides low-cost atomic operations which can
 improve system throughput for CPU-to-CPU communication, locks, and mutexes.
 The improvement can be up to an order of magnitude when using LSE instead of
@@ -67,11 +69,12 @@ load/store exclusives.
 POSIX threads library needs LSE atomic instructions.  LSE is important for
 locking and thread synchronization routines.  The following systems distribute
 a libc compiled with LSE instructions:
-- Amazon Linux 2,
-- Amazon Linux 2022,
-- Ubuntu 18.04 (needs `apt install libc6-lse`),
-- Ubuntu 20.04,
-- Ubuntu 22.04.
+- Amazon Linux 2
+- Amazon Linux 2023
+- Ubuntu 18.04 (needs `apt install libc6-lse`)
+- Ubuntu 20.04
+- Ubuntu 22.04
+- Ubuntu 24.04
 
 The compiler needs to generate LSE instructions for applications that use atomic
 operations.  For example, the code of databases like PostgreSQL contain atomic
@@ -87,8 +90,8 @@ To check whether the application binary contains load and store exclusives:
 $ objdump -d app | grep -i 'ldxr\|ldaxr\|stxr\|stlxr' | wc -l
 ```
 
-GCC's `-moutline-atomics` flag produces a binary that runs on both Graviton and
-Graviton2.  Supporting both platforms with the same binary comes at a small
+GCC's `-moutline-atomics` flag produces a binary that runs on both Graviton1 and later
+Gravitons with LSE support.  Supporting both platforms with the same binary comes at a small
 extra cost: one load and one branch.  To check that an application
 has been compiled with `-moutline-atomics`, `nm` command line utility displays
 the name of functions and global variables in an application binary.  The boolean
@@ -152,16 +155,16 @@ if (feof(stdin)) {
 }
 ```
 
-### Using Graviton2 Arm instructions to speed-up Machine Learning
+### Using Arm instructions to speed-up Machine Learning
 
-Graviton2 processors been optimized for performance and power efficient machine learning by enabling [Arm dot-product instructions](https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/exploring-the-arm-dot-product-instructions) commonly used for Machine Learning (quantized) inference workloads, and enabling [Half precision floating point - \_float16](https://developer.arm.com/documentation/100067/0612/Other-Compiler-specific-Features/Half-precision-floating-point-intrinsics) to double the number of operations per second, reducing the memory footprint compared to single precision floating point (\_float32), while still enjoying large dynamic range.
+Graviton2 and later processors been optimized for performance and power efficient machine learning by enabling [Arm dot-product instructions](https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/exploring-the-arm-dot-product-instructions) commonly used for Machine Learning (quantized) inference workloads, and enabling [Half precision floating point - \_float16](https://developer.arm.com/documentation/100067/0612/Other-Compiler-specific-Features/Half-precision-floating-point-intrinsics) to double the number of operations per second, reducing the memory footprint compared to single precision floating point (\_float32), while still enjoying large dynamic range.
 
 ### Using SVE
 
 The scalable vector extensions (SVE) require both a new enough tool-chain to
 auto-vectorize to SVE (GCC 11+, LLVM 14+) and a 4.15+ kernel that supports SVE.
 One notable exception is that Amazon Linux 2 with a 4.14 kernel doesn't support SVE;
-please upgrade to a 5.4+ AL2 kernel.
+please upgrade to a 5.4+ AL2 kernel.  Graviton3 and Graviton4 support SVE, earlier Gravitons does not.
 
 ### Using Arm instructions to speed-up common code sequences
 The Arm instruction set includes instructions that can be used to speedup common
 
@@ -1,18 +1,18 @@
-# DPDK, SPDK, ISA-L supports Graviton2
+# DPDK, SPDK, ISA-L supports Graviton
 
-Graviton2 is optimized for data path functions like networking and storage.  Users of [DPDK](https://github.com/dpdk/dpdk) and [SPDK](https://github.com/spdk/spdk) can download and compile natively on Graviton2 following the normal installation guidelines from the respective repositories linked above. 
+Graviton2 and later CPUs are optimized for data path functions like networking and storage.  Users of [DPDK](https://github.com/dpdk/dpdk) and [SPDK](https://github.com/spdk/spdk) can download and compile natively on Graviton following the normal installation guidelines from the respective repositories linked above.
 
 **NOTE**: *Though DPDK precompiled packages are available from Ubuntu but we recommend building them from source.*
 
-SPDK relies often on [ISA-L](https://github.com/intel/isa-l) which is already optimized for Arm64 and the CPU cores in Graviton2.
+SPDK relies often on [ISA-L](https://github.com/intel/isa-l) which is already optimized for Arm64 and the CPU cores in Graviton2 and later processors.
 
 
 
 ## Compile DPDK from source
 
 [DPDK official guidelines](https://doc.dpdk.org/guides/linux_gsg/build_dpdk.html) requires using *meson* and *ninja* to build from source code.
 
-A native compilation of DPDK on top of Graviton2 will generate optimized code that take advantage of the CRC and Crypto instructions in Graviton2 cpu cores.
+A native compilation of DPDK on top of Graviton will generate optimized code that take advantage of the CRC and Crypto instructions in Graviton2 and later cpu cores.
 
 **NOTE**: Some of the installations steps call "python" which may not be valid command in modern linux distribution,  you may need to install *python-is-python3* to resolve this.
 
@@ -35,5 +35,5 @@ Some application, written with the x86 architecture in mind, set the active dpdk
 
 ## Known issues
 
-* **testpmd:** The flowgen function of testpmd does not work correctly when compiled with GCC 9 and above. It generates IP packets with wrong checksum which are dropped when transmitted between AWS instances (including Graviton2). This is a known issue and there is a [patch](https://patches.dpdk.org/patch/84772/) that fixes it.
+* **testpmd:** The flowgen function of testpmd does not work correctly when compiled with GCC 9 and above. It generates IP packets with wrong checksum which are dropped when transmitted between AWS instances (including Graviton). This is a known issue and there is a [patch](https://patches.dpdk.org/patch/84772/) that fixes it.
 
@@ -5,7 +5,7 @@ Note: You can always find the latest Graviton announcements via these [What's Ne
 Service | Status | Resources |
  :-: | :-: | --- |
 [AWS App Mesh](https://aws.amazon.com/app-mesh/) | GA | What's New: [AWS App Mesh now supports ARM64-based Envoy Images](https://aws.amazon.com/about-aws/whats-new/2021/11/aws-app-mesh-arm64-envoy-images/) |
-[Amazon Aurora](https://aws.amazon.com/rds/aurora/) | GA | What's New: [Achieve up to 35% better price/performance with Amazon Aurora using new Graviton2 instances](https://aws.amazon.com/about-aws/whats-new/2021/03/achieve-up-to-35-percent-better-price-performance-with-amazon-aurora-using-new-graviton2-instances/)<br>Related blog: [Key considerations in moving to Graviton2 for Amazon RDS and Amazon Aurora databases](https://aws.amazon.com/blogs/database/key-considerations-in-moving-to-graviton2-for-amazon-rds-and-amazon-aurora-databases/)<br>For supported instance types and database engine versions see [Aurora DB Instances](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.DBInstanceClass.html) |
+[Amazon Aurora](https://aws.amazon.com/rds/aurora/) | GA | What's New: [Amazon Aurora MySQL and PostgreSQL support for Graviton3 based R7g instance family](https://aws.amazon.com/about-aws/whats-new/2023/05/amazon-aurora-mysql-postgresql-graviton3-based-r7g-instance-family/), [Achieve up to 35% better price/performance with Amazon Aurora using new Graviton2 instances](https://aws.amazon.com/about-aws/whats-new/2021/03/achieve-up-to-35-percent-better-price-performance-with-amazon-aurora-using-new-graviton2-instances/)<br>Related blog: [Key considerations in moving to Graviton2 for Amazon RDS and Amazon Aurora databases](https://aws.amazon.com/blogs/database/key-considerations-in-moving-to-graviton2-for-amazon-rds-and-amazon-aurora-databases/)<br>For supported instance types and database engine versions see [Aurora DB Instances](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.DBInstanceClass.html) |
 [Amazon EC2 Auto Scaling](https://aws.amazon.com/ec2/autoscaling/) | GA | What's New: [Amazon EC2 Auto Scaling announces support for multiple launch templates for Auto Scaling groups](https://aws.amazon.com/about-aws/whats-new/2020/11/amazon-ec2-auto-scaling-announces-support-for-multiple-launch-templates-for-auto-scaling-groups/)<br>Associated blog: [Supporting AWS Graviton2 and x86 instance types in the same Auto Scaling group](https://aws.amazon.com/blogs/compute/supporting-aws-graviton2-and-x86-instance-types-in-the-same-auto-scaling-group/)
 [AWS Batch](https://aws.amazon.com/batch/) | GA | Blog: [Target cross-platform Go builds with AWS CodeBuild Batch builds](https://aws.amazon.com/blogs/devops/target-cross-platform-go-builds-with-aws-codebuild-batch-builds/) |
 [AWS CodeBuild](https://aws.amazon.com/codebuild/) | GA | What's New: [AWS CodeBuild supports Arm-based workloads using AWS Graviton2](https://aws.amazon.com/about-aws/whats-new/2021/02/aws-codebuild-supports-arm-based-workloads-using-aws-graviton2/) |
@@ -15,8 +15,8 @@ Service | Status | Resources |
 [Amazon ECS](https://aws.amazon.com/ecs/) | GA | [Amazon ECS-optimized AMIs](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html) |
 [Amazon EKS](https://aws.amazon.com/eks/) | GA | What's New: [Amazon EKS support for Arm-based instances powered by AWS Graviton is now generally available](https://aws.amazon.com/about-aws/whats-new/2020/08/amazon-eks-support-for-arm-based-instances-powered-by-aws-graviton-now-generally-available/)<br>Launch Blog: [Amazon EKS on AWS Graviton2 generally available: considerations on multi-architecture apps](https://aws.amazon.com/blogs/containers/eks-on-graviton-generally-available/) |
 [AWS Elastic Beanstalk](https://aws.amazon.com/elasticbeanstalk/) | GA | What's New: [Elastic Beanstalk supports AWS Graviton-based Amazon EC2 instance types](https://aws.amazon.com/about-aws/whats-new/2021/11/elastic-beanstalk-aws-graviton-ec2/) |
-[Amazon ElastiCache](https://aws.amazon.com/elasticache/) | GA | What's New: [Amazon ElastiCache now supports M6g and R6g Graviton2-based instances](https://aws.amazon.com/about-aws/whats-new/2020/10/amazon-elasticache-now-supports-m6g-and-r6g-graviton2-based-instances/)<br> What's New: [Amazon ElastiCache now supports T4g Graviton2-based instances](https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-elasticache-supports-t4g-graviton2-based-instances/) |
-[Amazon EMR](https://aws.amazon.com/emr/) | GA | What's New: [Amazon EMR now provides up to 30% lower cost and up to 15% improved performance for Spark workloads on Graviton2-based instances](https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-emr-now-provides-up-to-30-lower-cost-and-up-to-15-improved-performance/)<br>Launch Blog: [Amazon EMR now provides up to 30% lower cost and up to 15% improved performance for Spark workloads on Graviton2-based instances](https://aws.amazon.com/blogs/big-data/amazon-emr-now-provides-up-to-30-lower-cost-and-up-to-15-improved-performance-for-spark-workloads-on-graviton2-based-instances/) |
+[Amazon ElastiCache](https://aws.amazon.com/elasticache/) | GA | What's New: [Amazon ElastiCache now supports M7g and R7g Graviton3-based nodes](https://aws.amazon.com/about-aws/whats-new/2023/08/amazon-elasticache-m7g-r7g-graviton-3-nodes/), [Amazon ElastiCache now supports M6g and R6g Graviton2-based instances](https://aws.amazon.com/about-aws/whats-new/2020/10/amazon-elasticache-now-supports-m6g-and-r6g-graviton2-based-instances/)<br> What's New: [Amazon ElastiCache now supports T4g Graviton2-based instances](https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-elasticache-supports-t4g-graviton2-based-instances/) |
+[Amazon EMR](https://aws.amazon.com/emr/) | GA | What's New: [Amazon EMR now supports Amazon EC2 C7g (Graviton3) instances](https://aws.amazon.com/about-aws/whats-new/2023/03/amazon-emr-amazon-ec2-c7g-graviton3-instances/), [Amazon EMR now provides up to 30% lower cost and up to 15% improved performance for Spark workloads on Graviton2-based instances](https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-emr-now-provides-up-to-30-lower-cost-and-up-to-15-improved-performance/)<br>Launch Blog: [Amazon EMR now provides up to 30% lower cost and up to 15% improved performance for Spark workloads on Graviton2-based instances](https://aws.amazon.com/blogs/big-data/amazon-emr-now-provides-up-to-30-lower-cost-and-up-to-15-improved-performance-for-spark-workloads-on-graviton2-based-instances/) |
 [Amazon EMR Serverless](https://aws.amazon.com/emr/serverless/) | GA | What's New: [Announcing AWS Graviton2 support for Amazon EMR Serverless - Get up to 35% better price-performance for your serverless Spark and Hive workload](https://aws.amazon.com/about-aws/whats-new/2022/11/aws-graviton2-emr-serverless-35-percent-price-performance-spark-hive-workloads/) |
 [AWS Fargate](https://aws.amazon.com/fargate/) | GA | Launch Blog: [Announcing AWS Graviton2 Support for AWS Fargate – Get up to 40% Better Price-Performance for Your Serverless Containers](https://aws.amazon.com/blogs/aws/announcing-aws-graviton2-support-for-aws-fargate-get-up-to-40-better-price-performance-for-your-serverless-containers/) |
 [Amazon Gamelift](https://aws.amazon.com/gamelift/) | GA | Launch Blog: [Now available: New Asia Pacific (Osaka) region and Graviton2 support for Amazon GameLift](https://aws.amazon.com/blogs/gametech/now-available-new-asia-pacific-osaka-region-and-graviton2-support-for-amazon-gamelift/)<br>Addition of Graviton3: [Announcing Amazon GameLift support for instances powered by AWS Graviton3 processors](https://aws.amazon.com/about-aws/whats-new/2023/08/amazon-gamelift-instances-aws-graviton-3-processors/)|
@@ -25,5 +25,5 @@ Service | Status | Resources |
 [Amazon Managed Streaming for Apache Kafka (MSK)](https://aws.amazon.com/msk/) | GA | What's New: [Amazon MSK now supports Graviton3-based M7g instances for new provisioned clusters](https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-msk-graviton3-m7g-instances-provisioned-clusters/) |
 [Amazon Neptune](https://aws.amazon.com/neptune/) | GA | What's New: [Announcing AWS Graviton2-based instances for Amazon Neptune](https://aws.amazon.com/about-aws/whats-new/2021/11/aws-graviton2-based-instances-amazon-neptune/) |
 [Amazon OpenSearch Service](https://aws.amazon.com/opensearch-service/) | GA | What's New: [Amazon Elasticsearch Service now offers AWS Graviton2 (M6g, C6g, R6g, and R6gd) instances](https://aws.amazon.com/about-aws/whats-new/2021/05/amazon-elasticsearch-service-offers-aws-graviton2-m6g-c6g-r6g-r6gd-instances/)<br>Related blog: [Increase Amazon Elasticsearch Service performance by upgrading to Graviton2](https://aws.amazon.com/blogs/big-data/increase-amazon-elasticsearch-service-performance-by-upgrading-to-graviton2/)|
-[Amazon RDS](https://aws.amazon.com/rds/) | GA | What's New: [Achieve up to 52% better price/performance with Amazon RDS using new Graviton2 instances](https://aws.amazon.com/about-aws/whats-new/2020/10/achieve-up-to-52-percent-better-price-performance-with-amazon-rds-using-new-graviton2-instances/)<br>Launch Blog: [New – Amazon RDS on Graviton2 Processors](https://aws.amazon.com/blogs/aws/new-amazon-rds-on-graviton2-processors/)<br>Related blog: [Key considerations in moving to Graviton2 for Amazon RDS and Amazon Aurora databases](https://aws.amazon.com/blogs/database/key-considerations-in-moving-to-graviton2-for-amazon-rds-and-amazon-aurora-databases/)<br>For supported instance types and database engine versions see [RDS DB Instances](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html) |
-[Amazon SageMaker](https://aws.amazon.com/pm/sagemaker/) | GA | What's New: [Amazon SageMaker adds eight new Graviton-based instances for model deployment](https://aws.amazon.com/about-aws/whats-new/2022/10/amazon-sagemaker-adds-new-graviton-based-instances-model-deployment/) <br> Related blog: [Run machine learning inference workloads on AWS Graviton-based instances with Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/run-machine-learning-inference-workloads-on-aws-graviton-based-instances-with-amazon-sagemaker/)|
+[Amazon RDS](https://aws.amazon.com/rds/) | GA | What's New: [Amazon RDS now supports M7g and R7g database instances](https://aws.amazon.com/about-aws/whats-new/2023/04/amazon-rds-m7g-r7g-database-instances/), [Achieve up to 52% better price/performance with Amazon RDS using new Graviton2 instances](https://aws.amazon.com/about-aws/whats-new/2020/10/achieve-up-to-52-percent-better-price-performance-with-amazon-rds-using-new-graviton2-instances/)<br>Launch Blog: [New – Amazon RDS on Graviton2 Processors](https://aws.amazon.com/blogs/aws/new-amazon-rds-on-graviton2-processors/)<br>Related blog: [Key considerations in moving to Graviton2 for Amazon RDS and Amazon Aurora databases](https://aws.amazon.com/blogs/database/key-considerations-in-moving-to-graviton2-for-amazon-rds-and-amazon-aurora-databases/)<br>For supported instance types and database engine versions see [RDS DB Instances](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html) |
+[Amazon SageMaker](https://aws.amazon.com/pm/sagemaker/) | GA | What's New: [Amazon SageMaker adds eight new Graviton-based instances for model deployment](https://aws.amazon.com/about-aws/whats-new/2022/10/amazon-sagemaker-adds-new-graviton-based-instances-model-deployment/) <br> Related blog: [Run machine learning inference workloads on AWS Graviton-based instances with Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/run-machine-learning-inference-workloads-on-aws-graviton-based-instances-with-amazon-sagemaker/), [Reduce Amazon SageMaker inference cost with AWS Graviton](https://aws.amazon.com/blogs/machine-learning/reduce-amazon-sagemaker-inference-cost-with-aws-graviton/)|
@@ -8,7 +8,7 @@ it relies on undefined behavior in the language (e.g. assuming char is signed in
 or the behavior of signed integer overflow), contains memory management bugs that
 happen to be exposed by aggressive compiler optimizations, or incorrect ordering.
 Below are some techniques / tools we have used to find issues
-while migrating our internal services to newer compilers and Graviton2.
+while migrating our internal services to newer compilers and Graviton based instances.
 
 ### Using Sanitizers
 The compiler may generate code and layout data slightly differently on Graviton
@@ -29,7 +29,7 @@ information.
 Arm is weakly ordered, similar to POWER and other modern architectures. While
 x86 is a variant of total-store-ordering (TSO).
 Code that relies on TSO may lack barriers to properly order memory references.
-Armv8 based systems, including Graviton and Graviton2 are [weakly ordered
+Armv8 based systems, including all Gravitons are [weakly ordered
 multi-copy-atomic](https://www.cl.cam.ac.uk/~pes20/armv8-mca/armv8-mca-draft.pdf).
 
 While TSO allows reads to occur out-of-order with writes and a processor to
@@ -54,16 +54,16 @@ is corresponding Arm code there too. If not, that might be something to improve.
 We welcome suggestions by opening an issue in this repo.
 
 ### Lock/Synchronization intensive workload
-Graviton2 supports the Arm Large Scale Extensions (LSE). LSE based locking and synchronization
-is an order of magnitude faster for highly contended locks with high core counts (e.g. 64 with Graviton2).
+Graviton2 processors and later support the Arm Large Scale Extensions (LSE). LSE based locking and synchronization
+is an order of magnitude faster for highly contended locks with high core counts (e.g. up to 192 cores on Graviton4).
 For workloads that have highly contended locks, compiling with `-march=armv8.2-a` will enable LSE based atomics and can substantially increase performance. However, this will prevent the code
 from running on an Arm v8.0 system such as AWS Graviton-based EC2 A1 instances.
 With GCC 10 and newer an option `-moutline-atomics` will not inline atomics and
 detect at run time the correct type of atomic to use. This is slightly worse
 performing than `-march=armv8.2-a` but does retain backwards compatibility.
 
 ### Network intensive workloads
-In some workloads, the packet processing capability of Graviton2 is both faster and
+In some workloads, the packet processing capability of Graviton is both faster and
 lower-latency than other platforms, which reduces the natural “coalescing”
 capability of Linux kernel and increases the interrupt rate.
 Depending on the workload it might make sense to enable adaptive RX interrupts
@@ -72,12 +72,29 @@ Depending on the workload it might make sense to enable adaptive RX interrupts
 ## Profiling the code
 If you aren't getting the performance you expect, one of the best ways to understand what is
 going on in the system is to compare profiles of execution and understand where the CPUs are
-spending time. This will frequently point to a hot function that could be optimized. A crutch
+spending time. This will frequently point to a hot function or sub-system that could be optimized. A crutch
 is comparing a profile between a system that is performing well and one that isn't to see the
 relative difference in execution time. Feel free to open an issue in this
 GitHub repo for advice or help.
 
-Install the Linux perf tool:
+Using [AWS APerf](https://github.com/aws/aperf) tool:
+```bash
+# Graviton
+wget -qO- https://github.com/aws/aperf/releases/download/v0.1.10-alpha/aperf-v0.1.10-alpha-aarch64.tar.gz | tar -xvz -C /target/directory
+
+# x86
+wget -qO- https://github.com/aws/aperf/releases/download/v0.1.10-alpha/aperf-v0.1.10-alpha-x86_64.tar.gz | tar -xvz -C /target/directory
+
+## Record a profile and generate a report
+cd /target/directory/
+./aperf record -r <RUN_NAME> -i <INTERVAL_NUMBER> -p <COLLECTION_PERIOD>
+./aperf report -r <COLLECTOR_DIRECTORY> -n <REPORT_NAME>
+
+## The resulting report can be viewed with a web-browser by opening the index.html file
+```
+
+
+Using the Linux perf tool:
 ```bash
 # Amazon Linux 2
 sudo yum install perf
 
@@ -11,12 +11,12 @@ Ubuntu | 20.04 LTS | Yes | 4KB | [focal](https://cloud-images.ubuntu.com/locator
 Ubuntu | 18.04 LTS | Yes (*) | 4KB | [bionic](https://cloud-images.ubuntu.com/locator/ec2/) | Yes | (*) needs `apt install libc6-lse`. Free support ended 2023/05/31.
 SuSE | 15 SP2 or later| Planned | 4KB | [MarketPlace](https://aws.amazon.com/marketplace/pp/B07SPTXBDX) | Yes | 
 Redhat Enterprise Linux | 8.2 or later | Yes | 64KB | [MarketPlace](https://aws.amazon.com/marketplace/pp/B07T2NH46P) | Yes | 
-~~Redhat Enterprise Linux~~ | ~~7.x~~ | ~~No~~ | ~~64KB~~ | ~~[MarketPlace](https://aws.amazon.com/marketplace/pp/B07KTFV2S8)~~ | | Supported on A1 instances but not on Graviton2 based ones
+~~Redhat Enterprise Linux~~ | ~~7.x~~ | ~~No~~ | ~~64KB~~ | ~~[MarketPlace](https://aws.amazon.com/marketplace/pp/B07KTFV2S8)~~ | | Supported on A1 instances but not on Graviton2 and later based ones
 AlmaLinux | 8.4 or later | Yes | 64KB | [AMIs](https://wiki.almalinux.org/cloud/AWS.html) | Yes |
 Alpine Linux | 3.12.7 or later | Yes (*) | 4KB | [AMIs](https://www.alpinelinux.org/cloud/) | | (*) LSE enablement checked in version 3.14 |
 CentOS | 8.2.2004 or later | No | 64KB | [AMIs](https://wiki.centos.org/Cloud/AWS#Images) | Yes | |
 CentOS Stream | 8 | No (*) | 64KB (*) | [Downloads](https://www.centos.org/centos-stream/) | |(*) details to be confirmed once AMI's are available|
-~~CentOS~~ | ~~7.x~~ | ~~No~~ | ~~64KB~~ | ~~[AMIs](https://wiki.centos.org/Cloud/AWS#Images)~~ | | Supported on A1 instances but not on Graviton2 based ones
+~~CentOS~~ | ~~7.x~~ | ~~No~~ | ~~64KB~~ | ~~[AMIs](https://wiki.centos.org/Cloud/AWS#Images)~~ | | Supported on A1 instances but not on Graviton2 and later based ones
 Debian | 11 | Yes | 4KB | [Community](https://wiki.debian.org/Cloud/AmazonEC2Image/Bullseye) or [MarketPlace](https://aws.amazon.com/marketplace/pp/prodview-jwzxq55gno4p4) | Yes |
 Debian | 10 | [Planned for Debian 11](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956418) | 4KB | [Community](https://wiki.debian.org/Cloud/AmazonEC2Image/Buster) or [MarketPlace](https://aws.amazon.com/marketplace/pp/B085HGTX5J) | Yes, as of Debian 10.7 (2020-12-07) |
 FreeBSD | 12.1 or later | No | 4KB | [Community](https://www.freebsd.org/releases/12.1R/announce.html) or [MarketPlace](https://aws.amazon.com/marketplace/pp/B081NF7BY7) | No | Device hotplug and API shutdown don't work
 
@@ -6,11 +6,11 @@ This document is a reference for software developers who want to benchmark, debu
 
 This document covers many topics including how to benchmark, how to debug performance and which optimization recommendations.  It is not meant to be read beginning-to-end. Instead view it as a collection of checklists and best known practices to apply when working with Graviton instances that go progressively deeper into analyzing the system.  Please see the FAQ below to direct you towards the most relevant set of checklists and tools depending on your specific situation.
 
-If after following these guides there is still an issue you cannot resolve with regards to performance on Graviton2, please do not hesitate to raise an issue on the [AWS-Graviton-Getting-Started](https://github.com/aws/aws-graviton-getting-started/issues) guide or contact us at [ec2-arm-dev-feedback@amazon.com](mailto:ec2-arm-dev-feedback@amazon.com).  If there is something missing in this guide, please raise an issue or better, post a pull-request.
+If after following these guides there is still an issue you cannot resolve with regards to performance on Graviton based instances, please do not hesitate to raise an issue on the [AWS-Graviton-Getting-Started](https://github.com/aws/aws-graviton-getting-started/issues) guide or contact us at [ec2-arm-dev-feedback@amazon.com](mailto:ec2-arm-dev-feedback@amazon.com).  If there is something missing in this guide, please raise an issue or better, post a pull-request.
 
 ## Pre-requisites
 
-To assist with some of the tasks listed in this runbook, we have created some helper-scripts for some of the tasks the checklists describe.  The helper-scripts assume the test instances are running an up-to-date AL2 or Ubuntu 20.04LTS distribution and the user can run the scripts using `sudo`. Follow the steps below to obtain and install the utilities on your test systems:
+To assist with some of the tasks listed in this runbook, we have created some helper-scripts for some of the tasks the checklists describe.  The helper-scripts assume the test instances are running an up-to-date AL2, AL2023 or Ubuntu 20.04LTS/22.04LTS distribution and the user can run the scripts using `sudo`. Follow the steps below to obtain and install the utilities on your test systems:
 
 ```bash
 # Clone the repository onto your systems-under-test and any load-generation instances
@@ -23,6 +23,11 @@ sudo ./install_perfrunbook_dependencies.sh
 # All scripts expect to run from the utilities directory
 ```
 
+## APerf for performance analysis
+
+There is also a new tool aimed at helping move workloads over to Graviton called [APerf](https://github.com/aws/aperf), it bundles many of the capabilities of the individual tools present in this
+runbook and provides a better presentation.  It is highly recommended to download this tool and use it to gather most of the same information in one test-run.
+
 ## Sections
 
 1. [Introduction to Benchmarking](./intro_to_benchmarking.md)
@@ -48,7 +53,7 @@ sudo ./install_perfrunbook_dependencies.sh
 * **I benchmarked my service and performance on Graviton is slower compared to my current x86 based fleet, where do I start to root cause why?**
     Begin by verifying software dependencies and verifying the configuration of your Graviton and x86 testing environments to check that no major differences are present in the testing environment.  Performance differences may be due to differences in environment and not the due to the hardware.  Refer to the below chart for a step-by-step flow through this runbook to help root cause the performance regression:
     ![](./images/performance_debug_flowchart.png)
-* **What are the recommended optimizations to try with Graviton2?**
+* **What are the recommended optimizations to try with Graviton?**
     Refer to [Section 6](./optimization_recommendation.md) for our recommendations on how to make your application run faster on Graviton.
 * **I investigated every optimization in this guide and still cannot find the root-cause, what do I do next?**
     Please contact us at [ec2-arm-dev-feedback@amazon.com](mailto:ec2-arm-dev-feedback@amazon.com) or talk with your AWS account team representative to get additional help.
 
@@ -4,13 +4,17 @@
 
 This Appendix contains additional information for engineers that want to go deeper on a particular topic, such as using different PMU counters to understand how the code is executing on the hardware, discussion on load generators, and additional tools to help with code observability.
 
-## Useful Graviton2 PMU Counters and ratios
+## Useful Graviton PMU Events and ratios
 
-The following list of counter ratios has been curated to list counters useful for performance debugging. The more extensive list of counters is contained in the following references:
+The following list of counter ratios has been curated to list events useful for performance debugging. The more extensive list of counters is contained in the following references:
 
 * [Arm ARM](https://developer.arm.com/documentation/102105/latest)
 * [Neoverse N1 TRM](https://developer.arm.com/documentation/100616/0400/debug-descriptions/performance-monitor-unit/pmu-events)
 * [Neoverse N1 PMU Guide](https://developer.arm.com/documentation/PJDOC-466751330-547673/r4p1?lang=en&rev=0)
+* [Neoverse V1 TRM](https://developer.arm.com/documentation/101427/latest/)
+* [Neoverse V1 PMU Guide](https://developer.arm.com/documentation/109708/latest/)
+* [Neoverse V2 TRM](https://developer.arm.com/documentation/102375/latest/)
+* [Neoverse V2 PMU Guide](https://developer.arm.com/documentation/109528/0100)
 
 |METRIC	|Counter #1	|Counter #2	|Formula	|Description	|
 |---	|---	|---	|---	|---	|
@@ -110,12 +114,13 @@ of such system level resources and if resources are used efficiently.
 CMN counters are only accessible on metal-type instances and certain OSes and kernels.
 
 
-|Distro      |Kernel   | Graviton2 (c6g) | Graviton3 (c7g) |
-|------------|---------|-----------------|-----------------|
-|Ubuntu-20.04| 5.15    |  yes            |    no           |
-|Ubuntu-20.04| >=5.19  |  yes            |    yes          |
-|Ubuntu-22.04| >=5.19  |  yes            |    yes          |
-|AL2023      | 6.1.2   |  yes            |    yes          |
+|Distro      |Kernel   | Graviton2 (6g)  | Graviton3 (7g)  | Graviton4 (8g) | 
+|------------|---------|-----------------|-----------------|----------------|
+|Ubuntu-20.04| 5.15    |  yes            |    no           | no             |
+|Ubuntu-20.04| >=5.19  |  yes            |    yes          | no             |
+|Ubuntu-22.04| >=5.19  |  yes            |    yes          | no             |
+|Ubuntu-24.04| >=6.8.0 |  yes            |    yes          | yes            |
+|AL2023      | 6.1.2   |  yes            |    yes          | no             |
 
 
 General procedure on Ubuntu
@@ -141,3 +146,4 @@ For further information about specific events and useful ratios, please refer to
 
 [ARM documentation for Graviton3's CMN-650](https://developer.arm.com/documentation/101481/0200/?lang=en)
 
+[ARM documentation for Graviton4's CMN-700](https://developer.arm.com/documentation/102308/latest/)
@@ -35,7 +35,7 @@ If you have more than one SUT, first verify there are no major differences in se
   %> uname -r
   4.14.219-161.340.amzn2.x86_64
 
-  # Example output on Graviton2 SUT
+  # Example output on Graviton SUT
   %> uname -r
   5.10.50-45.132.amzn2.aarch64
 
@@ -76,9 +76,9 @@ If you have more than one SUT, first verify there are no major differences in se
 
 ## Check for missing binary dependencies
 
-Libraries for Python or Java can link in binary shared objects to provide enhanced performance.  The absence of these shared object dependencies does not prevent the application from running on Graviton2, but the CPU will be forced to use a slow code-path instead of the optimized paths.  Use the checklist below to verify the same shared objects are available on all platforms.
+Libraries for Python or Java can link in binary shared objects to provide enhanced performance.  The absence of these shared object dependencies does not prevent the application from running on Graviton, but the CPU will be forced to use a slow code-path instead of the optimized paths.  Use the checklist below to verify the same shared objects are available on all platforms.
 
-1. JVM based languages — Check for the presence of binary shared objects in the installed JARs and compare between Graviton2 and x86.
+1. JVM based languages — Check for the presence of binary shared objects in the installed JARs and compare between Graviton and x86.
   ```bash
   %> cd ~/aws-getting-started-guide/perfrunbook/utilities
   %> sudo ./find_and_list_jar_with_so.sh
@@ -100,12 +100,12 @@ Libraries for Python or Java can link in binary shared objects to provide enhanc
   ./META-INF/native/linux64/libjansi.so
   ./META-INF/native/linux32/libjansi.so
   ``` 
-2. Python — Check for the presence of binary shared objects in your python version’s `site-packages` locations and compare between Graviton2 and x86: 
+2. Python — Check for the presence of binary shared objects in your python version’s `site-packages` locations and compare between Graviton and x86:
   ```bash
   %> cd ~/aws-getting-started-guide/perfrunbook/utilites
   %> sudo ./find_and_list_pylib_with_so.sh 3.7 # takes python version as arg
   # Example output ...
-  # ... Graviton2
+  # ... Graviton
   ./numpy/core/_multiarray_tests.cpython-37m-aarch64-linux-gnu.so
   ./numpy/core/_struct_ufunc_tests.cpython-37m-aarch64-linux-gnu.so
   ./numpy/core/_rational_tests.cpython-37m-aarch64-linux-gnu.so
@@ -130,14 +130,14 @@ Libraries for Python or Java can link in binary shared objects to provide enhanc
 
 ## Check native application build system and code
 
-For native compiled components of your application, proper compile flags are essential to make sure Graviton2’s hardware features are being fully taken advantage of.  Follow the below checklist:
+For native compiled components of your application, proper compile flags are essential to make sure Graviton’s hardware features are being fully taken advantage of.  Follow the below checklist:
 
 1. Verify equivalent code optimizations are being made for Graviton as well as x86.  For example with C/C++ code built with GCC, make sure if builds use `-O3` for x86, that Graviton builds also use that optimization and not some basic debug setting like just `-g`.
 2. Confirm when building for Graviton that **one of the following flags** are added to the compile line for GCC/LLVM12+ to ensure using Large System Extension instructions when able to speed up atomic operations.
-    1. Use `-moutline-atomics` for code that must run on Graviton1 and Graviton2
-    2. Use `-march=armv8.2a -mcpu=neoverse-n1` for code that will run on Graviton2 and other modern Arm platforms
+    1. Use `-moutline-atomics` for code that must run on all Graviton platforms
+    2. Use `-march=armv8.2a -mcpu=neoverse-n1` for code that will run on Graviton2 or later and other modern Arm platforms
 3. When building natively for Rust, ensure that `RUSTFLAGS` is set to **one of the following flags**
-    1. `export RUSTFLAGS="-Ctarget-features=+lse"` for code that will run on Graviton2 and earlier platforms that support LSE (Large System Extension) instructions.
+    1. `export RUSTFLAGS="-Ctarget-features=+lse"` for code that will run on all Graviton2 and other Arm platforms that support LSE (Large System Extension) instructions.
     2. `export RUSTFLAGS="-Ctarget-cpu=neoverse-n1"` for code that will only run on Graviton2 and later platforms.
 4. Check for the existence of assembly optimized on x86 with no optimization on Graviton.  For help with porting optimized assembly routines, see [Section 6](./optimization_recommendation.md).
   ```bash
 
@@ -53,7 +53,7 @@ You may see a small single-digit percent increase in overhead with pseudo-NMI en
 
 ## Off-cpu profiling
 
-If Graviton2 is consuming less CPU-time than expected, it is useful to find call-stacks that are putting *threads* to sleep via the OS.  Lock contention, IO Bottlenecks, OS scheduler issues can all lead to cases where performance is lower, but the CPU is not being fully utilized.   The method to look for what might be causing more off-cpu time is the same as with looking for functions consuming more on-cpu time: generate a flamegraph and compare.  In this case, the differences are more subtle to look for as small differences can mean large swings in performance as more thread sleeps can induce milli-seconds of wasted execution time.  
+If Graviton is consuming less CPU-time than expected, it is useful to find call-stacks that are putting *threads* to sleep via the OS.  Lock contention, IO Bottlenecks, OS scheduler issues can all lead to cases where performance is lower, but the CPU is not being fully utilized.   The method to look for what might be causing more off-cpu time is the same as with looking for functions consuming more on-cpu time: generate a flamegraph and compare.  In this case, the differences are more subtle to look for as small differences can mean large swings in performance as more thread sleeps can induce milli-seconds of wasted execution time.
 
 1. Verify native (i.e. C/C++/Rust) code is built with `-fno-omit-frame-``pointer`
 2. Verify java code is started with `-XX:+PreserveFramePointer -agentpath:/path/to/libperf-jvmti.so`
@@ -109,4 +109,3 @@ In our `capture_flamegraphs.sh` helper script, we use `perf record` to gather tr
     1. Use `-e instructions` to generate a flame-graph of the functions that use the most instructions on average to identify a compiler or code optimization opportunity.
     2. Use `-e cache-misses` to generate a flame-graph of functions that miss the L1 cache the most to indicate if changing to a more efficient data-structure might be necessary.
     3. Use `-e branch-misses` to generate a flame-graph of functions that cause the CPU to mis-speculate.  This may identify regions with heavy use of conditionals, or conditionals that are data-dependent and may be a candidate for refactoring.
-
@@ -16,7 +16,7 @@ There are hundreds of events available to monitor in a server CPU today which is
 
 ## How to Collect PMU counters
 
-A limited subset of PMU events for the CPU are available on Graviton \*6g, \*7g sizes <16xl, we recommend using a 16xl for experiments needing PMU events to get access to all of them. On 5th and 6th generation x86 instances use a single socket instance is needed to have access to the CPU PMU events: >c5.9xl, >\*5.12xl, >\*6i.16xl, >c5a.12xl, and >\*6a.24xl.  On 7th generation x86 instances *7a and *7i, all sizes get access to a limited number of CPU PMU events, just like on Graviton instances, and full socket or larger instances (>\*7\*.24xl) get access to all PMU events. 
+A limited subset of PMU events for the CPU are available on Graviton \*6g, \*7g sizes <16xl, we recommend using a 16xl for experiments needing PMU events to get access to all of them. On Graviton \*8g, sizes >24xl have access to all the CPU PMU events. On 5th and 6th generation x86 instances use a single socket instance is needed to have access to the CPU PMU events: >c5.9xl, >\*5.12xl, >\*6i.16xl, >c5a.12xl, and >\*6a.24xl.  On 7th generation x86 instances *7a and *7i, all sizes get access to a limited number of CPU PMU events, just like on Graviton instances, and full socket or larger instances (>\*7\*.24xl) get access to all PMU events.
 
 To measure the standard CPU PMU events, do the following:
 
@@ -121,7 +121,7 @@ To measure the standard CPU PMU events, do the following:
 
 This checklist describes the top-down method to debug whether the hardware is under-performing and what part is underperforming.  The checklist describes event ratios to check that are included in the helper-script.  All ratios are in terms of either misses-per-1000(kilo)-instruction or per-1000(kilo)-cycles.  This checklist aims to help guide whether a hardware slow down is coming from the front-end of the processor or the backend of the processor and then what particular part.  The front-end of the processor is responsible for fetching and supplying the instructions.  The back-end is responsible for executing the instructions provided by the front-end as fast as possible.  A bottleneck in either part will cause stalls and a decrease in performance.  After determining where the bottleneck may lie, you can proceed to [Section 6](./optimization_recommendation.md) to read suggested optimizations to mitigate the problem.
 
-1. Start by measuring `ipc` (Instructions per cycle) on each instance-type.  A higher IPC is better. A lower number for `ipc` on Graviton2 compared to x86 indicates *that* there is a performance problem.  At this point, proceed to attempt to root cause where the lower IPC bottleneck is coming from by collecting frontend and backend stall metrics.
+1. Start by measuring `ipc` (Instructions per cycle) on each instance-type.  A higher IPC is better. A lower number for `ipc` on Graviton compared to x86 indicates *that* there is a performance problem.  At this point, proceed to attempt to root cause where the lower IPC bottleneck is coming from by collecting frontend and backend stall metrics.
 2. Next, measure `stall_frontend_pkc` and `stall_backend_pkc` (pkc = per kilo cycle) and determine which is higher.  If stalls in the frontend are higher, it indicates the part of the CPU responsible for predicting and fetching the next instructions to execute is causing slow-downs.  If stalls in the backend are higher, it indicates the machinery that executes the instructions and reads data from memory is causing slow-downs
 
 ### Drill down front end stalls
 
@@ -80,7 +80,7 @@ It is also advisable to check memory consumption using `sysstat -r ALL` or `htop
   %> cd ~/aws-gravition-getting-started/perfrunbook/utilities
   %> python3 ./measure_and_plot_basic_sysstat_stats.py --stat new-connections --time 60
   ```
-2. If seeing bursts, verify this is expected behavior for your load generator.  Bursts can cause performance degradation on Graviton2 if each new connection has to do an RSA signing operation for TLS connection establishment.
+2. If seeing bursts, verify this is expected behavior for your load generator.  Bursts can cause performance degradation for each new connection, especially if it has to do an RSA signing operation for TLS connection establishment.
 3. Check on SUT for hot connections (connections that are more heavily used than others) by running: `watch netstat -t`
 4. The example below shows the use of `netstat -t` to watch TCP connections with one being hot as indicated by its non-zero `Send-Q` value while all other connections have a value of 0. This can lead to one core being saturated by network processing on the SUT, bottlenecking the rest of the system.  
   ```bash
@@ -117,11 +117,10 @@ When running Java applications, monitor for differences in behavior using JFR (J
     3. The image below shows JMC’s GC pane, showing pause times, heap size and references remaining after each collection.
     ![](./images/jmc_example_image.png)
 4. The same information can be gathered by enabling GC logging and then processing the log output. Enter `-Xlog:gc*,gc+age=trace,gc+ref=debug,gc+ergo=trace` on the Java command line and re-start your application.
-5. If longer GC pauses are seen, this could be happening because objects are living longer on Graviton2 and the GC has to scan them.  To help debug this gather an off-cpu profile ([see Section 5.b](./debug_code_perf.md)) to look for threads that are sleeping more often and potentially causing heap objects to live longer.
+5. If longer GC pauses are seen, this could be happening because objects are living longer on Graviton and the GC has to scan them.  To help debug this gather an off-cpu profile ([see Section 5.b](./debug_code_perf.md)) to look for threads that are sleeping more often and potentially causing heap objects to live longer.
 6. Check for debug flags that are still enabled but should be disabled, such as: `-XX:-OmitStackTraceInFastThrow` which logs and generates stack traces for all exceptions, even if they are not fatal exceptions.
-7. Check there are no major differences in JVM ergonomics between Graviton2 and x86, run:
+7. Check there are no major differences in JVM ergonomics between Graviton and x86, run:
   ```bash
   %> java -XX:+PrintFlagsFinal -version
-  # Capture output from x86 and Graviton2 and then diff the files
+  # Capture output from x86 and Graviton and then diff the files
   ```
-
@@ -2,7 +2,7 @@
 
 [Graviton Performance Runbook toplevel](./README.md)
 
-When designing an experiment to benchmark Graviton2 against another instance type, it is key to remember the below 2 guiding principles:
+When designing an experiment to benchmark Graviton based instances against another instance type, it is key to remember the below 2 guiding principles:
 
 1. Always define a specific question to answer with your benchmark
 2. Control your variables and unknowns within the benchmark environment
 
@@ -2,7 +2,7 @@
 
 [Graviton Performance Runbook toplevel](./README.md)
 
-This section describes multiple different optimization suggestions to try on Graviton2 instances to attain higher performance for your service.  Each sub-section defines some optimization recommendations that can help improve performance if you see a particular signature after measuring the performance using the previous checklists.
+This section describes multiple different optimization suggestions to try on Graviton based instances to attain higher performance for your service.  Each sub-section defines some optimization recommendations that can help improve performance if you see a particular signature after measuring the performance using the previous checklists.
 
 ## Optimizing for large instruction footprint
 
@@ -35,6 +35,8 @@ allocating huge-pages.
 2. For additional information on the vector instructions used on Graviton
     1. [Arm instrinsics guide](https://developer.arm.com/architectures/instruction-sets/intrinsics/)
     2. [Graviton2 core software optimization guide](https://developer.arm.com/documentation/pjdoc466751330-9707/2-0)
+    3. [Graviton3 core software optimization guide](https://developer.arm.com/documentation/pjdoc466751330-9685/latest/)
+    4. [Graviton4 core software optimization guide](https://developer.arm.com/documentation/PJDOC-466751330-593177/latest/)
 
 ## Optimizing synchronization heavy optimizations
 
@@ -60,12 +62,12 @@ allocating huge-pages.
   done
   ```
 3. Disable Receive Packet Steering (RPS) to avoid contention and extra IPIs. 
-    1.  `cat /sys/class/net/ethN/queues/rx-N/rps_cpus` and verify they are set to `0`. In general RPS is not needed on Graviton2. 
+    1.  `cat /sys/class/net/ethN/queues/rx-N/rps_cpus` and verify they are set to `0`. In general RPS is not needed on Graviton2 and newer.
     2. You can try using RPS if your situation is unique.  Read the [documentation on RPS](https://www.kernel.org/doc/Documentation/networking/scaling.txt) to understand further how it might help. Also refer to [Optimizing network intensive workloads on Amazon EC2 A1 Instances](https://aws.amazon.com/blogs/compute/optimizing-network-intensive-workloads-on-amazon-ec2-a1-instances/) for concrete examples.
 
 ## Metal instance IO optimizations
 
-1. If on Graviton2 metal instances, try disabling the System MMU (Memory Management Unit) to speed up IO handling:
+1. If on Graviton2 and newer metal instances, try disabling the System MMU (Memory Management Unit) to speed up IO handling:
   ```bash
   %> cd ~/aws-gravition-getting-started/perfrunbook/utilities
   # Configure the SMMU to be off on metal, which is the default on x86.
 
@@ -21,7 +21,7 @@ cargo build --release
 ```
 
 If you're running only on Graviton2 or newer hardware you can also enable other
-instructions by setting the cpu target as well:
+instructions by setting the cpu target such as the example below:
 
 ```
 export RUSTFLAGS="-Ctarget-cpu=neoverse-n1"