-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array4: __cuda_array_interface__ v3 #30
Array4: __cuda_array_interface__ v3 #30
Conversation
0e72782
to
44eb90c
Compare
44eb90c
to
d2937d0
Compare
7c0289a
to
f3ff788
Compare
Update: patch that unlocks that broken compiler range in pybind/pybind11#4220 cmake -S . -B build -DAMREX_GPU_BACKEND=CUDA -DpyAMReX_pybind11_repo=https://github.com/ax3l/pybind11.git -DpyAMReX_pybind11_branch=fix-nvcc-11.4-11.8 |
@RemiLehe build logic from README.md is this: So concretely: # Python packages if not already installed as described
python3 -m pip install -U pip setuptools wheel
python3 -m pip install -U cmake pytest
python3 -m pip install -U -r requirements.txt
# depending on what you try
python3 -m pip install cupy-cuda11x
python3 -m pip install numba
python3 -m pip install torch
# configure once (unless changing backend or versions heavily)
cmake -S . -B build -DAMReX_GPU_BACKEND=CUDA \
-DpyAMReX_pybind11_repo=https://github.com/ax3l/pybind11.git \
-DpyAMReX_pybind11_branch=fix-nvcc-11.4-11.8
# rinse & repeat: builds, packages & runs pip install
cmake --build build --target pip_install -j 8 and tests: # Run all tests
python3 -m pytest tests/
# Run tests from a single file
python3 -m pytest tests/test_array4.py
# Run a single test (useful during debugging)
python3 -m pytest tests/test_array4.py::test_array4_cupy
python3 -m pytest tests/test_multifab.py::test_mfab_ops_cuda_cupy
# Run all tests, do not capture "print" output and be verbose
python3 -m pytest -s -vvvv tests/test_array4.py and with nsight:
GUI:
|
f3ff788
to
7fe40bd
Compare
d0f2dec
to
5395043
Compare
5395043
to
df349c5
Compare
f5138a9
to
a7fc736
Compare
Start implementing the `__cuda_array_interface__` for zero-copy data exchange on Nvidia CUDA GPUs.
Since `for` loops create no scope in Python, we need to trigger finalize logic, including stream syncs, before the destructor of `MultiFab` iterators are called.
incl. 3D kernel launch
f204a6d
to
4175194
Compare
a6a1199
to
6eb2da4
Compare
A bit tricky to implement this caster as new constructor. Not currently needed, but adds comments where to do this.
Wuup, wuup. First part done. |
Start implementing the
__cuda_array_interface__
for zero-copy data exchange on Nvidia CUDA GPUs.Optional: accessing an external
__cuda_array_interface__
object in non-owning manner as AMReX Array4:https://github.com/cupy/cupy/blob/a5b24f91d4d77fa03e6a4dd2ac954ff9a04e21f4/cupy/core/core.pyx#L2478-L2514
mfab
andmfab_device
need to become functions, not fixtures. Otherwise they will be cached and outliveamrex.finalize()
: AMReX Initialize/Finalize as Context Manager #81 MultiFab: Fix Fixture Lifetime #84pyamrex/src/Base/MultiFab.cpp
Lines 72 to 75 in 78bbbc7
depends on MFIter::Finalize amrex#2983 and MFIter: Make Finalize Public amrex#2985
if
andfor
do not create a scope in Python (they do in C++):