Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert to default fullLTO on Clang #130048

Closed
Fidget-Spinner opened this issue Feb 12, 2025 · 9 comments
Closed

Revert to default fullLTO on Clang #130048

Fidget-Spinner opened this issue Feb 12, 2025 · 9 comments
Labels
build The build process and cross-build type-feature A feature request or enhancement

Comments

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Feb 12, 2025

Feature or enhancement

Proposal:

Python 3.12 changed the default for Clang to ThinLTO. However, many people were unaware of this change and did not update their build scripts. This leaves a lot of perf on the table for macOS and possibly some other platforms.

  1. CPython was bitten by this: Python3.13 performance Issue with python.org macOS installers on ARM Macs #122580
  2. Faster CPython was bitten by this: clang builds not using full LTO faster-cpython/bench_runner#342
  3. python-build-standalone (Astral) also seems to not have the fix: On Clang, LTO is ThinLTO, which leaves a lot of performance out for macOS astral-sh/python-build-standalone#528
  4. msys2/mingw also seems to have not noticed: python: build with full lto msys2/MINGW-packages#23384

This seems to be confusing and tripping up a lot of people. I propose we change the --with-lto default back to full.

cc @brandtbucher @corona10

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

@Fidget-Spinner Fidget-Spinner added the type-feature A feature request or enhancement label Feb 12, 2025
@brandtbucher
Copy link
Member

Heh, this was on my backlog of issues to open. I fully support this... to me, --with-lto has always meant "take a long time to compile the fastest executable".

@corona10
Copy link
Member

Okay, but when the people can be ready to change if we want to change the policy? Why not propose to people to use lto=full if they want? (Yeah It's grumpy comment but I just ask)

@corona10
Copy link
Member

Or criteria will be any platform(at least tier1)should not be slow down if we want to change the default build policy?
It means that we would have same criteria for other build configuration too.

@Fidget-Spinner
Copy link
Member Author

Okay, but when the people can be ready to change if we want to change the policy? Why not propose to people to use lto=full if they want? (Yeah It's grumpy comment but I just ask)

I think the problem is that people assume lto means lto full, and they forget that it's lto thin now on clang.

Or criteria will be any platform(at least tier1)should not be slow down if we want to change the default build policy?

Yeah that seems reasonable.

@encukou encukou added the build The build process and cross-build label Feb 13, 2025
@Fidget-Spinner
Copy link
Member Author

For the record, @zanieb tested this on Clang 19.1 (not apple Clang), and couldn't reproduce the slowdowns seen on macOS. So I suspect the latest clang versions have fixed this bug astral-sh/python-build-standalone#529

Fidget-Spinner added a commit to Fidget-Spinner/cpython that referenced this issue Feb 13, 2025
@Fidget-Spinner
Copy link
Member Author

I deem it too disruptive to change a configure option twice and flip flop between the defaults just based on what compiler is available. I am thus reverting the change. Sorry!

@brandtbucher
Copy link
Member

Wait, why was this reverted? Full LTO should always be the fastest option, right?

@corona10
Copy link
Member

corona10 commented Feb 13, 2025

http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html

"In a few cases ThinLTO even outperforms full LTO, most likely because the higher scalability of ThinLTO allows using a more aggressive backend optimization pipeline (similar to that of a non-LTO build)."

So not that always truth. It’s where to bet, but I believe that compiler people make parallel LTO performance be better for the performance stuff too. Basically it’s about the execution mechanism not about the optimization policy.

@zanieb
Copy link
Contributor

zanieb commented Feb 13, 2025

My benchmarks, though not rigorous, showed full LTO to be a little slower astral-sh/python-build-standalone#529 (comment) 🤷‍♀ I could do more benchmarks if people want? Or explore the difference here instead of downstream in python-build-standalone where there may be confounding factors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build The build process and cross-build type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants