Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong result for miniMDock #726

Open
mathialakan opened this issue Dec 11, 2023 · 5 comments
Open

Wrong result for miniMDock #726

mathialakan opened this issue Dec 11, 2023 · 5 comments
Milestone

Comments

@mathialakan
Copy link

mathialakan commented Dec 11, 2023

Hi

The HIP code of miniMDock (https://github.com/ORNL-PE/miniMDock/tree/sycl_dev ) is working perfectly in AMD systems, but on Intel-PVC systems, it builds successfully but giving wrong results. The results obtained for the 7cpa test case ( -lfile ./input/7cpa/7cpa_ligand.pdbqt -nrun 10) on Arognne's sunspot using iprof is attached here,
chipstar_iprof_sunspot.pdf

The correct result should be look like this, 7cpa_results.pdf

The code is using warp-level shuffle intrinsics for reduction (device/hip/kernels.cpp) and the code is able to setup warp/wavefront/subgroup size based on the running-system.
Would you please help me to resolve the issues related to this build.

Thanks
Mathi

@pjaaskel
Copy link
Collaborator

My guess is the masked __shfl() which, I think, is currently not supported by chipStar.

@linehill
Copy link
Collaborator

chipStar’s warp size is 32 by default. It does not seem that miniMDock’s HIP code accounts this size. For example, the WARPMINIMUMEXCHANGE call here will definitely compute out-of-bounds warp indices for __shfl() calls here which is probably undefined behavior.

If the Intel-PVC supports 64 wide subgroups in Level Zero or OpenCL, you could try changing chipStar’s warp size to it with -DDEFAULT_WARP_SIZE=64 configure option.

FYI, I had trouble compiling miniMDock for chipStar because of __any() calls because they trigger an assertion regarding a SPIR-V version in llvm-spirv. My guess is that our __any() implementation requires a SPIR-V version higher than what is being generated now (v1.2). Replacing the calls with 1 helped me to get past the issue and the changes do not seem to affect the result on ROCm.

@mathialakan
Copy link
Author

Thanks @pjaaskel - so any alternative way to replace __shfl()?

Thanks @linehill for pointing-out the issue with WARPMINIMUMEXCHANGE, I changed the WARPMINIMUM2 micro to dynamically work for the warp size based on the building system.

@pjaaskel
Copy link
Collaborator

FYI, I had trouble compiling miniMDock for chipStar because of __any() calls because they trigger an assertion regarding a SPIR-V version in llvm-spirv. My guess is that our __any() implementation requires a SPIR-V version higher than what is being generated now (v1.2). Replacing the calls with 1 helped me to get past the issue and the changes do not seem to affect the result on ROCm.

I think it calls ballot which is not supported in v1.2, but is supported by Intel's driver regardless. I've patched that in the chipStar LLVM-SPIRV branch so it generates it regardless and hopes for the best: CHIP-SPV/SPIRV-LLVM-Translator@0d66986

@mathialakan for the masked shfl, currently there is no easy workaround, but just try to reorganize the client code to use uniform shuffles.

@pvelesko
Copy link
Collaborator

@pjaaskel should we close this and/or open a new issue with for implementing masked shfl support?

@pvelesko pvelesko added this to the Release 1.3 milestone May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants