Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Move forward to C++20 #3223

Open
untereiner opened this issue Jul 12, 2024 · 18 comments
Open

[EPIC] Move forward to C++20 #3223

untereiner opened this issue Jul 12, 2024 · 18 comments
Labels
type: feature New feature or request type: new A new issue has been created and requires attention

Comments

@untereiner
Copy link
Contributor

What is the requested feature?
I want to open discussions to move forward to c++20

Is your request related to a specific problem?
Below is a list of desirable features that are standardized in C++20.
I also put the min (compiler/stdlib) versions where they are implemented

clang gcc
standardized text format (in replacement of fmt) 17 13
range library 15 10
concepts library 13 10
span 7 10
litterals as template params 12 9

I would like to discuss the feasibility of this change on the different architectures that geos is compiled on..

Describe the solution you'd like
n.a.

Describe alternatives you've considered
n.a.

Additional context
n.a.

@untereiner untereiner added type: feature New feature or request type: new A new issue has been created and requires attention labels Jul 12, 2024
@CusiniM
Copy link
Collaborator

CusiniM commented Jul 12, 2024

We are mostly ready for this. It's part of the reason why we started compiling with cuda12. The only oustanding issues/questions are:

  • do all target platforms have cuda12 and compilers that are new enough for novel c++20 features? I am not sure how to get this answer.
  • do the target AMD systems support c++20? I think the answer is yes but @wrtobin can probably confirm this.
  • the geos cuda12 builds on rockylinux have a compilation error that we have not been able to solve (see
    # compiler error in ElasticFirstOrderWaveEquationSEMKernel::StressComputation::launch in call to FE_TYPE::computeFirstOrderStiffnessTermX
    ). This should be resolved before fully moving to c++20.
  • we should upate our CI images to make sure they all have adequate compilers and cuda versions (this is the easiest part).

@untereiner
Copy link
Contributor Author

Thanks @CusiniM !
The question raised today in the team while implementing with string literals and templates.

For TTE maybe @Bubusch can also answer ?

@paveltomin
Copy link
Contributor

  • do all target platforms have cuda12 and compilers that are new enough for novel c++20 features?

CC @drmichaeltcvx
I think we do, I've been compiling with gcc 13 recently and it works fine but there are tons of warnings which would be nice to fix

@Bubusch
Copy link
Contributor

Bubusch commented Jul 15, 2024

@CusiniM @untereiner

Not yet on Pangea 3, I just asked to install cuda12 and gcc12 so we can make the switch.
Pangea 4 is OK.

FYI @sframba @herve-gross

@wrtobin
Copy link
Collaborator

wrtobin commented Jul 15, 2024

@CusiniM

BLT and its ability to set the C++ standard for HIP compilation have been updated to include C++20 support. The HIP/ROCM compilers we use are all clang-based and the intent is that they support the same standards the underlying clang compiler does. Naturally we might encounter issues/failures on that end, but if we do we can report them as bugs to AMD and they should be resolved eventually.

@rrsettgast
Copy link
Member

@untereiner @wrtobin @CusiniM
Other than Pangea3 and El Capitan, I don't see any potential problems. Perhaps we make a c++20 branch, and start testing some of the features we want to use in the code so that we can ensure portability.

@CusiniM
Copy link
Collaborator

CusiniM commented Jul 15, 2024

@untereiner @wrtobin @CusiniM Other than Pangea3 and El Capitan, I don't see any potential problems. Perhaps we make a c++20 branch, and start testing some of the features we want to use in the code so that we can ensure portability.

sounds like a plan. I think we should do it once #3218 is merged.

@drmichaeltcvx
Copy link

For CPU GEOS, we have been compiling using GCC 13.2.0 and running it w/o any issues on Zen4 and Zen4 AMD platforms.
For GPU GEOS, we have been successfully building with CUDA 12.2 and 11.8 using GCC 11.4.0.

I believe that CUDA 12.4 started supporting GCC 13.X (https://developer.nvidia.com/blog/cuda-toolkit-12-4-enhances-support-for-nvidia-grace-hopper-and-confidential-computing/#:~:text=A100%20performance%20improvements-,Compiler%20updates,objects%20supporting%20multiple%20GPU%20architectures.)

I need to report that we have started getting some run-time errors with the CUDA 12.2 builds. I believe that @victorapm will be looking into this issues.

@Bubusch
Copy link
Contributor

Bubusch commented Jul 31, 2024

@CusiniM @untereiner

We received feedback from IBM that Pangea3 current drivers are not compatible with cuda12.
 They plan to update the driver in January 2025 during a maintenance operation. I'll keep you updated on the progress.

FYI @sframba @herve-gross

@untereiner untereiner changed the title Move forward to C++20 [EPIC] Move forward to C++20 Oct 2, 2024
@untereiner
Copy link
Contributor Author

If I understand correctly the issue in #3314 gcc 9.4 has a bug when compiling modern C++.
I am wondering if it is necessary for one partner to keep this version of gcc ?

I gathered all compiler versions from the CI below:

  • CPU builds
Compiler name version build type platform
clang 15.0.7 Release ubuntu 22.04
gcc 9.4.0 Release ubuntu 20.04
gcc 10.5.0 Debug ubuntu 20.04
gcc 10.5.0 Release ubuntu 20.04
gcc 11.4.0 Release ubuntu 22.04
gcc 12.3.0 Release ubuntu 22.04
  • GPU builds
Compiler name version build type platform
clang 10.0.0 Debug ubuntu 20.04
gcc 9.4.0 Debug ubuntu 20.04
cuda 11.8.89 Debug ubuntu 20.04
clang 10.0.0 Release ubuntu 20.04
gcc 9.4.0 Release ubuntu 20.04
cuda 11.8.89 Release ubuntu 20.04
clang 17.0.6 Release rocky linux 8
cuda 12.5.1 Release rocky linux 8
gcc 8.5 Release rocky linux 8
cuda 12.5.1 Release rocky linux 8
gcc 9.4 Release AlmaLinux 8.8
cuda 11.5 Release AlmaLinux 8.8
  • Is it necessary for one partner to keep a gcc version below 10 ?
  • Does it make sens to compile the same target in Release AND Debug ? (e.g. cpu gcc 10.5), Shall Debug not be enough ?
  • Also I am compiling GEOS with gcc 14.2.1 (only cpu) without any issue.
  • More generally can we put somewhere a table with this info and time to time update the min versions to not loose time on old compiler issues ?

@wrtobin
Copy link
Collaborator

wrtobin commented Oct 2, 2024

@untereiner @rrsettgast and I had a discussion about dropping 9.X support. The only reason we haven't committed to making the move is the Pangea-3 CI using 9.4 as well. If we need to continue to support that platform+gcc version, we want to discover issues in the cheaper CPU CI prior to the expensive GPU CI whenever possible.

@sframba
Copy link
Contributor

sframba commented Oct 2, 2024

@untereiner What is the issue with gcc 9.4? Can you point me to the compiler bug?
Pangea3 (the machine, not the emulator) compiles GEOS today with gcc 8.4.1. We may be able to downgrade the AlmaLinux CI configuration to this version, but we need to check that everything is OK with Qemu.
@Bubusch Did you have any feedback on the deployment of higher gcc versions on Pangea3?

@wrtobin
Copy link
Collaborator

wrtobin commented Oct 2, 2024

Its a bug I fixed yesterday in a branch of lvarray. It only happens when we create an arrayofarrays or arrayofsets with number of arrays and default array sizes known at compile time:

https://github.com/GEOS-DEV/LvArray/pull/335/files

Basically this valid placement-new:

for( std::size_t i = size; i < std::size_t( newSize ); ++i )
{
  new ( ptr + i ) T( std::forward< ARGS >( args )... );
}

caused a compiler error:

 cc1plus: error: 'void* __builtin_memset(void*, int, long unsigned int)' specified bound between 18446744065119617040 and 18446744073709551612 exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=]
In member function 'virtual void testBufferOps_test_pack_unpack_data_arrayofsets_Test::TestBody()':
cc1plus: error: 'void* __builtin_memset(void*, int, long unsigned int)' specified bound between 18446744065119617040 and 18446744073709551612 exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=]

in the gcc-9.4 CI.

changing it to this:

for( T * iptr = ptr + size; iptr < ptr + newSize; ++iptr )
{
  new ( iptr ) T( std::forward< ARGS >( args )... );
}

which is just a slight refactoring of the operation, caused the compilation to succeed.

@wrtobin
Copy link
Collaborator

wrtobin commented Oct 2, 2024

The code that caused the bug to be found only exists in a new unit test in #3314, so it wouldn't and couldn't have caused actual failures on any platforms geos develop is currently deployed to / used on.

@sframba
Copy link
Contributor

sframba commented Oct 2, 2024

Thanks for the explanation, so if I understand correctly, a move away from gcc 9.4 is no longer urgent?
@Bubusch mentioned that the upgrade to CUDA 12 on Pangea3 might be possible starting Jan 2025, at that time we might check which gcc and ompi version updates are possible and upgrade the whole configuration on P3. Then if all goes well we could update the corresponding CI configuration. @Bubusch any thoughts?

@untereiner
Copy link
Contributor Author

untereiner commented Oct 2, 2024

There is no rush. I just wanted to point out that the 9.x branch of gcc is no longer supported. Not sure it is good practice relying on old no longer supported compiler branches.
Also it is worth to mention that this bug had the side effect to put aside some PRs.

I think as a good practice to periodically update the min versions of the compilers (Currently: gcc >= 11.x)

@wrtobin
Copy link
Collaborator

wrtobin commented Oct 2, 2024

The bug has been mitigated, so it isn't critical.

It is however very unlikely this will ever be addressed in the gcc-9.X release line, and sunsetting older compiler versions is something we occasionally want to do regardless so legacy bugs like this don't impact developer time and/or require unpleasant workarounds (notwithstanding that this workaround was essentially trivial once identified).

If / when we can move pangea 3 to a newer GCC it would be a good idea to sunset support for gcc pre 10.X. generally.

@CusiniM
Copy link
Collaborator

CusiniM commented Oct 3, 2024

The bug has been mitigated, so it isn't critical.

It is however very unlikely this will ever be addressed in the gcc-9.X release line, and sunsetting older compiler versions is something we occasionally want to do regardless so legacy bugs like this don't impact developer time and/or require unpleasant workarounds (notwithstanding that this workaround was essentially trivial once identified).

If / when we can move pangea 3 to a newer GCC it would be a good idea to sunset support for gcc pre 10.X. generally.

I have just had a chat with @Bubusch. Moving to a higher version of gcc on Pangea3 should be possible. The only constraint is that it has to be compatible with the current version of the cuda drivers installed on Pangea3. He will ask and let us know as soon as it's done. Moving to cuda12 is a separate issue and there is no guarantee that it will happen quickly.

I already have a PR (GEOS-DEV/thirdPartyLibs#282) where I am upgrading some of our CI images. For example, I have removed the ubuntu 20 with gcc 9 one in favor of a later ubuntu version with gcc14.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature New feature or request type: new A new issue has been created and requires attention
Projects
None yet
Development

No branches or pull requests