Skip to content

Enable GPU exection of summarize_timestep via OpenACC #1294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

gdicker1
Copy link
Collaborator

This PR modifies the code and adds OpenACC directives to allow the loops in the summarize_timestep routine can execute on GPUs.

Timing information for the temporary OpenACC data transfers in this routine is captured in the log file by a new timer: 'summarize_timestep [ACC_data_xfer]'.

This PR includes a re-write of the loops associated with config_print_detailed_minmax_vel to ensure consistent locations are reported for the minimum and maximum values for the wind speeds.

…mestep

These 'integer, pointer' variables are used as loop bounds and can cause
issues with OpenACC. Since OpenACC will directly copy the pointers and
may not preserve the association (may not also copy the value pointed
to), odd behavior can result.
This prevents having to add OpenACC routine information to
mpas_log_write and routine(s) it calls.
Only for loops executed when 'config_print_detailed_minmax_vel = .true.'

When running with OpenACC, it is difficult to get values to "come with"
during a reduction. This means it's difficult to get both the minimum
value of an array and the position (the array indices) it occurs at in
one pass.  Using 2 loops like this ensures the correct positions are
found.

Using a "linear index" to record the position of the minimum/maximum
values helps in cases where the min/max occurs at multiple points in an
array. The detailed prints will always use the first occurence of the
min/max in the array.
Adds timing information for the OpenACC data transfers in the
'summarize_timestep [ACC_data_xfer]' timer. This timer will be removed
as data regions in the MPAS-A dycore are fused.
@mgduda mgduda requested review from mgduda and abishekg7 March 21, 2025 02:09
@mgduda mgduda added Atmosphere OpenACC Work related to OpenACC acceleration of code labels Mar 21, 2025
@@ -7262,11 +7263,19 @@ subroutine summarize_timestep(domain)
real (kind=RKIND) :: lonMax, lonMax_global
real (kind=RKIND), dimension(5) :: localVals, globalVals

real (kind=RKIND) :: spd
real (kind=RKIND), dimension(:,:), allocatable :: spd
!$acc declare create(spd)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good, but I had a question about why the declare create is required here. Wasn't fully clear about that from our previous meetings.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this just because it seems to be what is preferred by the OpenACC standard for Fortran allocatable arrays.

acc declare create(spd) allows the compiled code to "just do the right thing" whenever it encounters the allocate(spd(...)) or deallocate(spd) statements later on - in this case, allocate this array on both device and host.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since spd won't be fused into any data region (and thus isn't a type of temporary data movement) it doesn't need to be tracked in any MPAS_ACC_TIMER_... region.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for the explanation!

@abishekg7
Copy link
Collaborator

The code changes look reasonable and I've only compared the log file to the reference run, but that seems to match.

Perhaps the commit history could be squashed into fewer commits, but also fine as it is.

Adds timing information for the OpenACC data transfers in the temporary
'summarize_timestep [ACC_data_xfer]' timer.
Adds timing information for the OpenACC data transfers in the temporary
'summarize_timestep [ACC_data_xfer]' timer.
@gdicker1 gdicker1 force-pushed the atmosphere/acc_summarize_timestep branch from 256463e to a21db2c Compare March 25, 2025 22:42
@gdicker1
Copy link
Collaborator Author

gdicker1 commented Mar 25, 2025

Force-push from 256463e to a21db2c has no functional changes. I just corrected to having separate commits for adding OpenACC for config_print_global_minmax_vel section and for config_print_detailed_minmax_vel section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Atmosphere OpenACC Work related to OpenACC acceleration of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants