Skip to content

Commit 57dc813

Browse files
committed
Periodic update to overall documentation to account for latest Graviton4
release.
1 parent 7e35711 commit 57dc813

16 files changed

+102
-71
lines changed

CommonNativeJarsTable.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Org | jar | Builds on Arm | Arm Artifact available | Minimum Version
55
com.github.luben | [zstd-jni](https://github.com/luben/zstd-jni) | yes | [yes](https://mvnrepository.com/artifact/com.github.luben/zstd-jni) | 1.2.0
66
org.lz4 | [lz4-java](https://github.com/lz4/lz4-java) | yes | [yes](https://mvnrepository.com/artifact/org.lz4/lz4-java) | 1.4.0
77
org.xerial.snappy | [snappy-java](https://github.com/xerial/snappy-java) | yes | [yes](https://mvnrepository.com/artifact/org.xerial.snappy/snappy-java) | 1.1.4
8-
org.rocksdb | [rocksdbjni](https://github.com/facebook/rocksdb/tree/master/java) | yes | [yes](https://mvnrepository.com/artifact/org.rocksdb/rocksdbjni) | 5.0.1
8+
org.rocksdb | [rocksdbjni](https://github.com/facebook/rocksdb/tree/master/java) | yes | [yes](https://mvnrepository.com/artifact/org.rocksdb/rocksdbjni) | 5.0.1 (7.4.3+ recommended)
99
com.github.jnr | [jffi](https://github.com/jnr/jffi) | yes | [yes](https://mvnrepository.com/artifact/com.github.jnr/jffi) | 1.2.13
1010
org.apache.commons | [commons-crypto](https://github.com/apache/commons-crypto) | yes | [yes](https://search.maven.org/artifact/org.apache.commons/commons-crypto/1.1.0/jar) | 1.1.0
1111
io.netty | [netty-transport-native-epoll](https://github.com/netty/netty) | yes | [yes](https://mvnrepository.com/artifact/io.netty/netty-transport-native-epoll) | 4.1.50

Monitoring_Tools_on_Graviton.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ One can collect hardware events/ counters for an application, on a specific CPU,
108108
More details on how to use Linux perf utility on AWS Graviton processors is available [here](https://github.com/aws/aws-graviton-getting-started/blob/main/optimizing.md#profiling-the-code).
109109

110110
## Summary: Utilities on AWS Graviton vs. Intel x86 architectures
111-
|Processor |x86 |Graviton2,3 |
111+
|Processor |x86 |Graviton2,3, and 4 |
112112
|--- |--- |--- |
113113
|CPU frequency listing |*lscpu, /proc/cpuinfo, dmidecode* |*dmidecode* |
114114
|*turbostat* support |Yes |No |
@@ -117,12 +117,12 @@ More details on how to use Linux perf utility on AWS Graviton processors is avai
117117
|*i7z* Works |Yes |No |
118118
|*lmbench* |Yes |Yes |
119119
|Intel *MLC* |Yes |No |
120-
|Performance monitoring tools |_[VTune Profiler](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html) and [PCM](https://github.com/opcm/pcm), [Linux perf](https://www.brendangregg.com/perf.html)_ |_[Linux perf](https://www.brendangregg.com/perf.html), [Arm Forge](https://developer.arm.com/Tools%20and%20Software/Arm%20Forge)_ |
120+
|Performance monitoring tools |_[VTune Profiler](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html), [PCM](https://github.com/opcm/pcm), [Linux perf](https://www.brendangregg.com/perf.html), [APerf](https://github.com/aws/aperf)_ |_[Linux perf](https://www.brendangregg.com/perf.html), [Linaro Forge](https://www.linaroforge.com/), [Arm Streamline CLI Tools](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer), [APerf](https://github.com/aws/aperf)_ |
121121

122122
Utilities such as *lmbench* are available [here](http://lmbench.sourceforge.net/) and can be built for AWS Graviton processors to obtain latency and bandwidth stats.
123123

124124
**Notes**:
125125

126126
**1.** The ARM Linux kernel community has decided not to put CPU frequency in _/proc/cpuinfo_ which can be read by tools such as _lscpu_ or directly.
127127

128-
**2.** On AWS Graviton 2/3 processors, Turbo isn’t supported. So, utilities such as ‘turbostat’ aren’t supported/ relevant for Arm architecture (and not on AWS Graviton processor either). Also, tools such as *[i7z](https://code.google.com/archive/p/i7z/)* for discovering CPU frequency, turbo, sockets and other information are only supported on Intel architecture/ processors. Intel *MLC* is a memory latency checker utility that is only supported on Intel processors.
128+
**2.** On AWS Graviton processors, Turbo isn’t supported. So, utilities such as ‘turbostat’ aren’t supported/ relevant. Also, tools such as *[i7z](https://code.google.com/archive/p/i7z/)* for discovering CPU frequency, turbo, sockets and other information are only supported on Intel architecture/ processors. Intel *MLC* is a memory latency checker utility that is only supported on Intel processors.

c-c++.md

+17-14
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22

33
### Enabling Arm Architecture Specific Features
44

5-
TLDR: To target all current generation Graviton instances (Graviton2 and
6-
Graviton3), use `-march=arm8.2-a`.
5+
TLDR: To target all current generation Graviton instances (Graviton2,
6+
Graviton3, and Graviton4), use `-march=arm8.2-a`.
77

88
C and C++ code can be built for Graviton with a variety of flags, depending on
99
the goal. If the goal is to get the best performance for a specific generation,
@@ -21,6 +21,7 @@ CPU | Flag (performance) | Flag (balanced) | GCC version
2121
-------------|-----------------------|---------------------------|------------------|---------------
2222
Graviton2 | `-mcpu=neoverse-n1` ¹ | `-march=armv8.2-a` | GCC-9 | Clang/LLVM 10+
2323
Graviton3(E) | `-mcpu=neoverse-v1` | `-mcpu=neoverse-512tvb` ² | GCC 11 | Clang/LLVM 14+
24+
Graviton4 | `-mcpu=neoverse-v2` | `-mcpu=neoverse-512tvb` ² | GCC 13 | Clang/LLVM 16+
2425

2526
¹ Requires GCC-9 or later (or GCC-7 for Amazon Linux 2); otherwise we suggest
2627
using `-mcpu=cortex-a72`
@@ -48,7 +49,8 @@ Distribution | GCC | Clang/LLVM
4849
----------------|----------------------|-------------
4950
Amazon Linux 2023 | 11* | 15*
5051
Amazon Linux 2 | 7*, 10 | 7, 11*
51-
Ubuntu 22.04 | 9, 10, 11*, 12 | 11, 12, 13, 14*
52+
Ubuntu 24.04 | 9, 10, 11, 12, 13*, 14 | 14, 15, 16, 17, 18*
53+
Ubuntu 22.04 | 9, 10, 11*, 12 | 11, 12, 13, 14*
5254
Ubuntu 20.04 | 7, 8, 9*, 10 | 6, 7, 8, 9, 10, 11, 12
5355
Ubuntu 18.04 | 4.8, 5, 6, 7*, 8 | 3.9, 4, 5, 6, 7, 8, 9, 10
5456
Debian10 | 7, 8* | 6, 7, 8
@@ -58,7 +60,7 @@ SUSE Linux ES15 | 7*, 9, 10 | 7
5860

5961
### Large-System Extensions (LSE)
6062

61-
The Graviton2 and Graviton3(E) processors have support for the Large-System Extensions (LSE)
63+
All Graviton processors after Graviton1 have support for the Large-System Extensions (LSE)
6264
which was first introduced in vArmv8.1. LSE provides low-cost atomic operations which can
6365
improve system throughput for CPU-to-CPU communication, locks, and mutexes.
6466
The improvement can be up to an order of magnitude when using LSE instead of
@@ -67,11 +69,12 @@ load/store exclusives.
6769
POSIX threads library needs LSE atomic instructions. LSE is important for
6870
locking and thread synchronization routines. The following systems distribute
6971
a libc compiled with LSE instructions:
70-
- Amazon Linux 2,
71-
- Amazon Linux 2022,
72-
- Ubuntu 18.04 (needs `apt install libc6-lse`),
73-
- Ubuntu 20.04,
74-
- Ubuntu 22.04.
72+
- Amazon Linux 2
73+
- Amazon Linux 2023
74+
- Ubuntu 18.04 (needs `apt install libc6-lse`)
75+
- Ubuntu 20.04
76+
- Ubuntu 22.04
77+
- Ubuntu 24.04
7578

7679
The compiler needs to generate LSE instructions for applications that use atomic
7780
operations. For example, the code of databases like PostgreSQL contain atomic
@@ -87,8 +90,8 @@ To check whether the application binary contains load and store exclusives:
8790
$ objdump -d app | grep -i 'ldxr\|ldaxr\|stxr\|stlxr' | wc -l
8891
```
8992

90-
GCC's `-moutline-atomics` flag produces a binary that runs on both Graviton and
91-
Graviton2. Supporting both platforms with the same binary comes at a small
93+
GCC's `-moutline-atomics` flag produces a binary that runs on both Graviton1 and later
94+
Gravitons with LSE support. Supporting both platforms with the same binary comes at a small
9295
extra cost: one load and one branch. To check that an application
9396
has been compiled with `-moutline-atomics`, `nm` command line utility displays
9497
the name of functions and global variables in an application binary. The boolean
@@ -152,16 +155,16 @@ if (feof(stdin)) {
152155
}
153156
```
154157

155-
### Using Graviton2 Arm instructions to speed-up Machine Learning
158+
### Using Arm instructions to speed-up Machine Learning
156159

157-
Graviton2 processors been optimized for performance and power efficient machine learning by enabling [Arm dot-product instructions](https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/exploring-the-arm-dot-product-instructions) commonly used for Machine Learning (quantized) inference workloads, and enabling [Half precision floating point - \_float16](https://developer.arm.com/documentation/100067/0612/Other-Compiler-specific-Features/Half-precision-floating-point-intrinsics) to double the number of operations per second, reducing the memory footprint compared to single precision floating point (\_float32), while still enjoying large dynamic range.
160+
Graviton2 and later processors been optimized for performance and power efficient machine learning by enabling [Arm dot-product instructions](https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/exploring-the-arm-dot-product-instructions) commonly used for Machine Learning (quantized) inference workloads, and enabling [Half precision floating point - \_float16](https://developer.arm.com/documentation/100067/0612/Other-Compiler-specific-Features/Half-precision-floating-point-intrinsics) to double the number of operations per second, reducing the memory footprint compared to single precision floating point (\_float32), while still enjoying large dynamic range.
158161

159162
### Using SVE
160163

161164
The scalable vector extensions (SVE) require both a new enough tool-chain to
162165
auto-vectorize to SVE (GCC 11+, LLVM 14+) and a 4.15+ kernel that supports SVE.
163166
One notable exception is that Amazon Linux 2 with a 4.14 kernel doesn't support SVE;
164-
please upgrade to a 5.4+ AL2 kernel.
167+
please upgrade to a 5.4+ AL2 kernel. Graviton3 and Graviton4 support SVE, earlier Gravitons does not.
165168

166169
### Using Arm instructions to speed-up common code sequences
167170
The Arm instruction set includes instructions that can be used to speedup common

dpdk_spdk.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
1-
# DPDK, SPDK, ISA-L supports Graviton2
1+
# DPDK, SPDK, ISA-L supports Graviton
22

3-
Graviton2 is optimized for data path functions like networking and storage. Users of [DPDK](https://github.com/dpdk/dpdk) and [SPDK](https://github.com/spdk/spdk) can download and compile natively on Graviton2 following the normal installation guidelines from the respective repositories linked above.
3+
Graviton2 and later CPUs are optimized for data path functions like networking and storage. Users of [DPDK](https://github.com/dpdk/dpdk) and [SPDK](https://github.com/spdk/spdk) can download and compile natively on Graviton following the normal installation guidelines from the respective repositories linked above.
44

55
**NOTE**: *Though DPDK precompiled packages are available from Ubuntu but we recommend building them from source.*
66

7-
SPDK relies often on [ISA-L](https://github.com/intel/isa-l) which is already optimized for Arm64 and the CPU cores in Graviton2.
7+
SPDK relies often on [ISA-L](https://github.com/intel/isa-l) which is already optimized for Arm64 and the CPU cores in Graviton2 and later processors.
88

99

1010

1111
## Compile DPDK from source
1212

1313
[DPDK official guidelines](https://doc.dpdk.org/guides/linux_gsg/build_dpdk.html) requires using *meson* and *ninja* to build from source code.
1414

15-
A native compilation of DPDK on top of Graviton2 will generate optimized code that take advantage of the CRC and Crypto instructions in Graviton2 cpu cores.
15+
A native compilation of DPDK on top of Graviton will generate optimized code that take advantage of the CRC and Crypto instructions in Graviton2 and later cpu cores.
1616

1717
**NOTE**: Some of the installations steps call "python" which may not be valid command in modern linux distribution, you may need to install *python-is-python3* to resolve this.
1818

@@ -35,5 +35,5 @@ Some application, written with the x86 architecture in mind, set the active dpdk
3535

3636
## Known issues
3737

38-
* **testpmd:** The flowgen function of testpmd does not work correctly when compiled with GCC 9 and above. It generates IP packets with wrong checksum which are dropped when transmitted between AWS instances (including Graviton2). This is a known issue and there is a [patch](https://patches.dpdk.org/patch/84772/) that fixes it.
38+
* **testpmd:** The flowgen function of testpmd does not work correctly when compiled with GCC 9 and above. It generates IP packets with wrong checksum which are dropped when transmitted between AWS instances (including Graviton). This is a known issue and there is a [patch](https://patches.dpdk.org/patch/84772/) that fixes it.
3939

0 commit comments

Comments
 (0)