You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: Monitoring_Tools_on_Graviton.md
+3-3
Original file line number
Diff line number
Diff line change
@@ -108,7 +108,7 @@ One can collect hardware events/ counters for an application, on a specific CPU,
108
108
More details on how to use Linux perf utility on AWS Graviton processors is available [here](https://github.com/aws/aws-graviton-getting-started/blob/main/optimizing.md#profiling-the-code).
109
109
110
110
## Summary: Utilities on AWS Graviton vs. Intel x86 architectures
111
-
|Processor |x86 |Graviton2,3 |
111
+
|Processor |x86 |Graviton2,3, and 4|
112
112
|--- |--- |--- |
113
113
|CPU frequency listing |*lscpu, /proc/cpuinfo, dmidecode*|*dmidecode*|
114
114
|*turbostat* support |Yes |No |
@@ -117,12 +117,12 @@ More details on how to use Linux perf utility on AWS Graviton processors is avai
Utilities such as *lmbench* are available [here](http://lmbench.sourceforge.net/) and can be built for AWS Graviton processors to obtain latency and bandwidth stats.
123
123
124
124
**Notes**:
125
125
126
126
**1.** The ARM Linux kernel community has decided not to put CPU frequency in _/proc/cpuinfo_ which can be read by tools such as _lscpu_ or directly.
127
127
128
-
**2.** On AWS Graviton 2/3 processors, Turbo isn’t supported. So, utilities such as ‘turbostat’ aren’t supported/ relevant for Arm architecture (and not on AWS Graviton processor either). Also, tools such as *[i7z](https://code.google.com/archive/p/i7z/)* for discovering CPU frequency, turbo, sockets and other information are only supported on Intel architecture/ processors. Intel *MLC* is a memory latency checker utility that is only supported on Intel processors.
128
+
**2.** On AWS Graviton processors, Turbo isn’t supported. So, utilities such as ‘turbostat’ aren’t supported/ relevant. Also, tools such as *[i7z](https://code.google.com/archive/p/i7z/)* for discovering CPU frequency, turbo, sockets and other information are only supported on Intel architecture/ processors. Intel *MLC* is a memory latency checker utility that is only supported on Intel processors.
GCC's `-moutline-atomics` flag produces a binary that runs on both Graviton and
91
-
Graviton2. Supporting both platforms with the same binary comes at a small
93
+
GCC's `-moutline-atomics` flag produces a binary that runs on both Graviton1 and later
94
+
Gravitons with LSE support. Supporting both platforms with the same binary comes at a small
92
95
extra cost: one load and one branch. To check that an application
93
96
has been compiled with `-moutline-atomics`, `nm` command line utility displays
94
97
the name of functions and global variables in an application binary. The boolean
@@ -152,16 +155,16 @@ if (feof(stdin)) {
152
155
}
153
156
```
154
157
155
-
### Using Graviton2 Arm instructions to speed-up Machine Learning
158
+
### Using Arm instructions to speed-up Machine Learning
156
159
157
-
Graviton2 processors been optimized for performance and power efficient machine learning by enabling [Arm dot-product instructions](https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/exploring-the-arm-dot-product-instructions) commonly used for Machine Learning (quantized) inference workloads, and enabling [Half precision floating point - \_float16](https://developer.arm.com/documentation/100067/0612/Other-Compiler-specific-Features/Half-precision-floating-point-intrinsics) to double the number of operations per second, reducing the memory footprint compared to single precision floating point (\_float32), while still enjoying large dynamic range.
160
+
Graviton2 and later processors been optimized for performance and power efficient machine learning by enabling [Arm dot-product instructions](https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/exploring-the-arm-dot-product-instructions) commonly used for Machine Learning (quantized) inference workloads, and enabling [Half precision floating point - \_float16](https://developer.arm.com/documentation/100067/0612/Other-Compiler-specific-Features/Half-precision-floating-point-intrinsics) to double the number of operations per second, reducing the memory footprint compared to single precision floating point (\_float32), while still enjoying large dynamic range.
158
161
159
162
### Using SVE
160
163
161
164
The scalable vector extensions (SVE) require both a new enough tool-chain to
162
165
auto-vectorize to SVE (GCC 11+, LLVM 14+) and a 4.15+ kernel that supports SVE.
163
166
One notable exception is that Amazon Linux 2 with a 4.14 kernel doesn't support SVE;
164
-
please upgrade to a 5.4+ AL2 kernel.
167
+
please upgrade to a 5.4+ AL2 kernel. Graviton3 and Graviton4 support SVE, earlier Gravitons does not.
165
168
166
169
### Using Arm instructions to speed-up common code sequences
167
170
The Arm instruction set includes instructions that can be used to speedup common
Copy file name to clipboardexpand all lines: dpdk_spdk.md
+5-5
Original file line number
Diff line number
Diff line change
@@ -1,18 +1,18 @@
1
-
# DPDK, SPDK, ISA-L supports Graviton2
1
+
# DPDK, SPDK, ISA-L supports Graviton
2
2
3
-
Graviton2 is optimized for data path functions like networking and storage. Users of [DPDK](https://github.com/dpdk/dpdk) and [SPDK](https://github.com/spdk/spdk) can download and compile natively on Graviton2 following the normal installation guidelines from the respective repositories linked above.
3
+
Graviton2 and later CPUs are optimized for data path functions like networking and storage. Users of [DPDK](https://github.com/dpdk/dpdk) and [SPDK](https://github.com/spdk/spdk) can download and compile natively on Graviton following the normal installation guidelines from the respective repositories linked above.
4
4
5
5
**NOTE**: *Though DPDK precompiled packages are available from Ubuntu but we recommend building them from source.*
6
6
7
-
SPDK relies often on [ISA-L](https://github.com/intel/isa-l) which is already optimized for Arm64 and the CPU cores in Graviton2.
7
+
SPDK relies often on [ISA-L](https://github.com/intel/isa-l) which is already optimized for Arm64 and the CPU cores in Graviton2 and later processors.
8
8
9
9
10
10
11
11
## Compile DPDK from source
12
12
13
13
[DPDK official guidelines](https://doc.dpdk.org/guides/linux_gsg/build_dpdk.html) requires using *meson* and *ninja* to build from source code.
14
14
15
-
A native compilation of DPDK on top of Graviton2 will generate optimized code that take advantage of the CRC and Crypto instructions in Graviton2 cpu cores.
15
+
A native compilation of DPDK on top of Graviton will generate optimized code that take advantage of the CRC and Crypto instructions in Graviton2 and later cpu cores.
16
16
17
17
**NOTE**: Some of the installations steps call "python" which may not be valid command in modern linux distribution, you may need to install *python-is-python3* to resolve this.
18
18
@@ -35,5 +35,5 @@ Some application, written with the x86 architecture in mind, set the active dpdk
35
35
36
36
## Known issues
37
37
38
-
***testpmd:** The flowgen function of testpmd does not work correctly when compiled with GCC 9 and above. It generates IP packets with wrong checksum which are dropped when transmitted between AWS instances (including Graviton2). This is a known issue and there is a [patch](https://patches.dpdk.org/patch/84772/) that fixes it.
38
+
***testpmd:** The flowgen function of testpmd does not work correctly when compiled with GCC 9 and above. It generates IP packets with wrong checksum which are dropped when transmitted between AWS instances (including Graviton). This is a known issue and there is a [patch](https://patches.dpdk.org/patch/84772/) that fixes it.
0 commit comments