Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #70

zamazan4ik · 2024-01-08T02:46:50Z

Hi!

I test Profile-Guided Optimization (PGO) on different kinds of software - the current results are available here (with a lot of other PGO-related information). Since PGO helps with achieving better performance with many compilers (like Rustc, GCC, Clang, etc.) I think trying to optimize Sage with PGO can be a good idea. I did some benchmarks and want to share my results.

Test environment

Fedora 39
Linux kernel 6.6.9
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.75
Sage version: the latest for now from the main branch on commit 72b536f61ebb6332c57cf57fab9fe53b1e878c1d
Disabled Turbo boost (for more stable results across runs)

Benchmarks

As a benchmark, I use built-in benchmarks with cargo bench command. For the PGO optimization phase, I use cargo-pgo with cargo pgo optimize bench. For the PGO training phase, I use the same benchmark with cargo pgo bench.

Results

I got the following results:

Release: https://gist.github.com/zamazan4ik/09666344f7cb0ee92a69d4a14a8b50e6
PGO-optimized compared to Release: https://gist.github.com/zamazan4ik/2adb489319886015c98e393cac5e2e57
(just for reference) PGO-instrumented compared to Release: https://gist.github.com/zamazan4ik/ae10d0fa65fb4be599876735b7ef15a6

According to the tests, PGO makes things faster in Sage.

I need to note that enabling Link-Time Optimization (LTO) is generally a good idea too - this optimization works well together with PGO. I even performed some benchmarks where I compared LTO to Release: https://gist.github.com/zamazan4ik/6be63330d2c97b510fdfc6b7aa7988c5 . However, in some cases, there are performance regressions - that need to be investigated. LTO was enabled with codegen-units = 1 and lto = "fat" for the corresponding profiles in Cargo.toml file.

Further steps

I can suggest the following action points:

Perform more PGO benchmarks on Sage. If it shows improvements - add a note to the documentation about possible improvements in Sage performance with PGO.
Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize Sage according to their workloads.

Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated into other projects:

Rustc: a CI script for the multi-stage build
GCC:
- Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
- A part in a "wonderful" configure script
Clang: Docs
Python:
- CPython: README
- Pyston: README
Go: Bash script
V8: Bazel flag
ChakraCore: Scripts
Chromium: Script
Firefox: Docs
- Thunderbird has PGO support too
PHP - Makefile command and old Centminmod scripts
MySQL: CMake script
YugabyteDB: GitHub commit
FoundationDB: Script
Zstd: Makefile
Foot: Scripts
Windows Terminal: GitHub PR
Pydantic-core: GitHub PR
file.d: GitHub PR
OceanBase: CMake flag

Please treat the issue just as a benchmark report - it's not an actual error, crash, or something like that. I don't know how much you care about performance in Sage so I don't know how important these improvements are for the project. I hope we can use these benchmarks at least as an additional data point about PGO efficiency for compilers.

The text was updated successfully, but these errors were encountered:

adam-mcdaniel · 2024-01-08T07:14:19Z

Hello, thank you for the highly detailed writeup, I really appreciate the effort you put into writing this! These are really interesting results -- I'm very curious why there seem to be performance regressions with link-time optimizations. This is a great collection of resources for PGO in other projects as well! I will investigate adding PGO to Sage and also researching why LTO might cause it to suffer. Thanks again, fantastic issue!

adam-mcdaniel added the Benchmarks🧪 label Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #70

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #70

zamazan4ik commented Jan 8, 2024

adam-mcdaniel commented Jan 8, 2024

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #70

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #70

Comments

zamazan4ik commented Jan 8, 2024

Test environment

Benchmarks

Results

Further steps

adam-mcdaniel commented Jan 8, 2024