Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR #522 Compiler Optimization Issue #567

Closed
program-- opened this issue Jul 12, 2023 · 4 comments · Fixed by #670
Closed

PR #522 Compiler Optimization Issue #567

program-- opened this issue Jul 12, 2023 · 4 comments · Fixed by #670
Labels
optimization runtime performance optimization considerations

Comments

@program--
Copy link
Contributor

program-- commented Jul 12, 2023

This issue is meant to track an ongoing issue in PR #522 regarding optimizations in GCC and Clang on boost::geometry SRS transformation objects. Clang/LLVM is being used primarily to debug this issue.

Current Behavior

When compiling wkb.cpp with -O0, compilation is successful.

However, when compiling with -O1 or higher, compilation fails. Verbose output from clang shows that recursive calls to llvm::ScalarEvolution::getRangeRef are causing a stack overflow.

By increasing the process stack size (using ulimit -s 15036), we can successfully compile with -O1 and higher. This, however, doesn't alleviate the issue of the recursive calls during optimization.

Bisect Checking

To figure out which optimization is causing the overflow, we use the LLVM option -opt-bisect-limit=N to find both the last successful compilation, and the first failed compilation. In my case, this was passes 97 and 98. Outputting the results of these passes and using diff returned:

98c98
< BISECT: running pass (98) SROAPass on _ZN5boost8geometry3srs4dpar10parametersIdE3addINS2_10value_projEEERS4_T_
---
> BISECT: NOT running pass (98) SROAPass on _ZN5boost8geometry3srs4dpar10parametersIdE3addINS2_10value_projEEERS4_T_

This (I think) tells us the SROA Optimization is causing the error.

I believe this is the SROA optimization applied to boost::geometry::srs::dpar::parameters::add(value_proj), which is used twice in the projection code for defining epsg5070 and epsg3857, though I'm not sure.

Important
As a correction, the optimization is applied to boost::geometry::projections::detail::epsg_to_parameters(), in particular the EPSG table defined by boost.geometry. So this is the true root cause, SRA applied to the 4k-5k definitions that boost has, since the definitions are chained expressions that need to be flattened before SRA is fully applied (I think).

Behavior in GCC vs Clang

Currently, GCC does not segfault regardless of having a default stack size. However, compilation is very slow (upwards of 30+ minutes). I'm guessing that the same or similar optimization is causing the issue, since outputting and inspecting the optimization call graph shows optimizing hanging up at the boost::geometry::srs::dpar::parameters explicit constructor, which calls ::add(value_proj) in our case.

System Information

I am using the following versions to debug this:

  • gcc version 13.1.1 20230429
  • clang version 15.0.7
  • Intel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.1.0.20230320)

With Linux kernel: 6.4.2-arch1-1

The LLVM IR output for this issue is attached here: wkb.ll.tar.gz. I am running the following on it to verify optimization issues:

opt -O2 -S -verify-each

Potentially Related Issues

llvm/llvm-project#49579

boostorg/geometry#1006

@program-- program-- mentioned this issue Jul 14, 2023
30 tasks
@mattw-nws mattw-nws added the optimization runtime performance optimization considerations label Jul 17, 2023
@mattw-nws
Copy link
Contributor

If this doesn't sort itself out before operationalization, then yes it is worth looking at--ATM we intend to use the Intel LLVM compilers in an operational setting. But if this is in fact an LLVM issue then it may "fix itself" before then.

@PhilMiller
Copy link
Contributor

I just confirmed that clang from Homebrew llvm 16.0.6 with -O2 does not have any long compilation time or crash.

@PhilMiller
Copy link
Contributor

Huh, even with clang 15, compilation of wkb.cpp finished in under a minute, though maybe my Mac's default stack size allowed for that.

@PhilMiller
Copy link
Contributor

Ugh, Homebrew for some weird reason has clang-16 in my llvm@15 tree. So, my testing of clang 15 is not meaningful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization runtime performance optimization considerations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants