Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable GPU execution of mpas_reconstruct_2d via OpenACC #1289

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

gdicker1
Copy link
Collaborator

This PR slightly modifies and adds OpenACC directives to mpas_reconstruct_2d so it can execute on GPU(s).

Timing for the OpenACC data transfers in this routine is captured in the log file by a new timer: mpas_reconstruct_2d [ACC_data_xfer].

NOTE two things about this PR:

Add nVertLevels and derefernce integer pointers to loop bounds so they
transfer to the GPU correctly. Also make loops in vertical dimension
explicit for OpenACC parallel loop directives.
Since this routine is called before mpas_atm_dynamics_init during
atm_core_init, these fields must also be transfered within
mpas_reconstruct_2d routine. After mpas_atm_dynamics_init, these fields
are not transferred during following uses of mpas_reconstruct_2d due to
OpenACC present_or_copyin behavior.
This change allows data needed for the mpas_reconstruct_2d routine to be
fetched onto the device (GPU) at the beginning and end of the routine.
The time for these transfers are captured in a new timer
'mpas_reconstruct_2d [ACC_data_xfer]'.
@gdicker1
Copy link
Collaborator Author

I used the compare_netcdf.py script and looked at the differences between log.atmosphere.0000.out files to characterize the differences. I used the 6 timestep regional testcase.

Running the loop near mpas_vector_reconstruct.F L273-283 in unmodified code on the CPU had no answer differences to a GPU run of the commit I started this PR-branch from.

When running this code on the GPU I observed differences in the u, w, and scalars [1-8] values reported in the log file. Unlike some other answer differences, the locations of mins and maxes did not change for u and w. Select output from compare_netcdf.py baseline_acc/restart.2019-09-01_00.06.00.nc port_att/restart.2019-09-01_00.06.00.nc 1:

            Variable  Min       Max      
=========================================                                                
# ... omitting lines ...
                   u -0.041306  0.041719
                   w -0.040645  0.030554
              rho_zz -0.000086  0.000116                                                                                                                                          
             theta_m -0.054626  0.081970                                                 
          pressure_p -16.194092  11.606445
# ... omitting lines ...

Footnotes

  1. Full output saved on Derecho within "/glade/work/gdicker/mpas-work/2025Feb06_PortMPASReconstruct/port_att1739582865/diff.baselineVtest.restart.txt"

@gdicker1
Copy link
Collaborator Author

More on the answer differences, this does seem to be due to how the default CPU and GPU math implementations differ. I get no answer differences if I add the flags described in #1287 to my baseline_acc and PR branch builds (namely -gpu=math_uniform; -Mnofma is already in GPU builds).

@mgduda mgduda added the OpenACC Work related to OpenACC acceleration of code label Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OpenACC Work related to OpenACC acceleration of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants