-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a generic OpenCL ICD loader. #417
Conversation
The consistent new failure in oneMKL tests is very suspicious. One potential reason for the crash on free is that
i.e. both the version from OpenCL_jll.jl is loaded (which Julia However, building a version of oneAPI_Support_jll without
|
SONAMEs are identical:
I really don't get why MKL (or rather pi_opencl) would want to load a second copy of libOpenCL, resulting in crashes when it does, or in failures to load OpenCL if it doesn't. SYCL_PI_TRACE also isn't helpful, e.g., in the case where oneAPI_Support_jll is shipped without libopencl.so and thus only the generic one can be loaded:
How does that make sense; libOpenCL is loaded, SYCL_PI_TRACE shows that it's detected, yet MKL complains? |
Okay, so apparently MKL's loading of libOpenCL.so ignores the fact that we've already loaded a copy, but does work when first setting LD_LIBRARY_PATH to OpenCL_jll's artifact dir. That's not workable, however, it may not be terrible, because the USM-related crash at exit also happens when only having a single libOpenCL.so loaded (by setting LD_LIBRARY_PATH manually). So, for the purpose of getting PVC working, we could:
|
From
Yet I can't replicate this by loading
It won't be possible to make this an actual dummy library, as symbols are looked up in the loaded
|
The reason I can't replicate this is probably because the lazy load actually only happens during
|
I guess my mental model of how libdl works was wrong:
i.e. loading the same library, once via a full path, once via a path that can be looked up in LD_LIBRARY_PATH, loads the library multiple times. This does actually work fine if MKL would Potential solutions:
|
Last time it was the oneMKL's gpu_buffer was dangling after Julia has released its oneMKL handle. https://github.com/JuliaGPU/oneAPI.jl/blob/master/deps/src/onemkl.cpp#L4334 This time, it looks like that we need to add: |
e6d25d9
to
0cd3ec0
Compare
Filed a bug upstream: uxlfoundation/oneMath#472 |
1241b1d
to
aa9fda3
Compare
MWE of the oneMKL failure: using oneAPI, LinearAlgebra, Test
function main(p)
println("Running test with $p batches...")
m = 15
n = 10
elty = Float32
A = [rand(elty,n,n) for i = 1:p]
A = [A[i]' * A[i] + I for i = 1:p]
B = [rand(elty,n,p) for i = 1:p]
d_A = oneMatrix{elty}[]
d_B = oneMatrix{elty}[]
for i in 1:p
push!(d_A, oneMatrix(A[i]))
push!(d_B, oneMatrix(B[i]))
end
oneMKL.potrf_batched!(d_A)
oneMKL.potrs_batched!(d_A, d_B)
for i = 1:p
LAPACK.potrf!('L', A[i])
LAPACK.potrs!('L', A[i], B[i])
if !(B[i] ≈ collect(d_B[i]))
println("Error in batch $i")
return false
end
end
return true
end
main(5)
main(10) This often fails during both the
It's disturbing that switching ICD loaders causes oneMKL to fail like that, both in terms of correctness and these segfaults. FWIW, it's not related to
@pengtu Back to you, I guess... |
@maleadt: Looks like that we cannot call mkl_sycl_destructor in syclQueueDestroy because it also invoke ~queue, so we are calling ~queue twice. I will back out the mkl_sycl_destructor call. Please rerun both the MKL matmul and the new p(5), p(10) tests and let me know the results. I will ask the MKL team depending on the results. |
aa9fda3
to
6d05f6c
Compare
Rebased, but crashes locally with the same error:
Did it not do that on your system? I also came across the following:
|
I verified in GDB that Using a debug/asserts build of NEO/IGC doesn't reveal anything. |
6d05f6c
to
a454010
Compare
a454010
to
9e42bce
Compare
The stack trace in GDB shows that SYCL runtime dies in the SYCL queue destructor trying to execute all the pending open commands.
|
The program passes with SYCL_PI_LEVEL_ZERO_BATCH_SIZE=1 julia --project test.jl, so batching seems to be the problem. Our current way of forcing an oneMKL task into the execution apparently does not work in this case:
|
@maleadt: Can oneAPI.jl set the environment variable SYCL_PI_LEVEL_ZERO_BATCH_SIZE=1 for this release? It only affects the SYCL kernel submissions. The only SYCL functions that oneAPI.jl calls are oneMKL functions that require eager dispatch so performance impact will be mostly positive. We can also remove the current workaround, which also reduces the oneMKL call overhead. |
Seems to work here too, great! Let's see what CI thinks. |
Note that on PVC there looks to be another MKL issue lurking:
I haven't investigated this yet. |
Fixes #406. The ICD loader we ship as part of oneAPI_Support_jll, required by oneMKL, apparently doesn't support PVC. Here, I switch to Khronos' generic ICD loader: