Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiplication of StridedMaybeAdjOrTransMat broken for certain matrix sizes #442

Open
leios opened this issue Jun 5, 2024 · 3 comments
Open

Comments

@leios
Copy link
Contributor

leios commented Jun 5, 2024

If the size of the array is ~ 10, then a' * a works fine.

julia> using oneAPI

julia> rand_array = rand(Float32, 10, 2);

julia> one_array = oneArray(rand_array);

julia> rand_array' * rand_array
2×2 Matrix{Float32}:
 3.73734  2.68277
 2.68277  3.32426

julia> one_array' * one_array
2×2 oneArray{Float32, 2, oneAPI.oneL0.DeviceBuffer}:
 3.73734  2.68277
 2.68277  3.32426

If it is 100, it fails:

julia> rand_array = rand(Float32, 100, 2);

julia> rand_array' * rand_array
2×2 Matrix{Float32}:
 32.107   24.3659
 24.3659  32.234

julia> one_array = oneArray(rand_array);

julia> one_array' * one_array
2×2 oneArray{Float32, 2, oneAPI.oneL0.DeviceBuffer}:
 0.0  0.0
 0.0  0.0

It seems to be calling this function in LinearAlgebra/matmul.jl:

function (*)(A::StridedMaybeAdjOrTransMat{<:BlasReal}, B::StridedMaybeAdjOrTransMat{<:BlasReal})
    TS = promote_type(eltype(A), eltype(B))
    mul!(similar(B, TS, (size(A, 1), size(B, 2))),
         wrapperop(A)(convert(AbstractArray{TS}, _unwrap(A))),
         wrapperop(B)(convert(AbstractArray{TS}, _unwrap(B))))
end

segfault on close:

[982661] signal (11.128): Segmentation fault
in expression starting at none:0
_ZN3NEO13DrmAllocation15makeBOsResidentEPNS_9OsContextEjPSt6vectorIPNS_12BufferObjectESaIS5_EEb at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_13Gen12LpFamilyEE16processResidencyERKSt6vectorIPNS_18GraphicsAllocationESaIS5_EEj at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_13Gen12LpFamilyEE13flushInternalERKNS_11BatchBufferERKSt6vectorIPNS_18GraphicsAllocationESaIS8_EE at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_13Gen12LpFamilyEE5flushERNS_11BatchBufferERSt6vectorIPNS_18GraphicsAllocationESaIS7_EE at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO21CommandStreamReceiver17submitBatchBufferERNS_11BatchBufferERSt6vectorIPNS_18GraphicsAllocationESaIS5_EE at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN2L015CommandQueueImp17submitBatchBufferEmRSt6vectorIPN3NEO18GraphicsAllocationESaIS4_EEPvb at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN2L014CommandQueueHwIL14GFXCORE_FAMILY18EE26executeCommandListsRegularERNS2_27CommandListExecutionContextEjPP25_ze_command_list_handle_tP18_ze_fence_handle_tP18_ze_event_handle_tjPSB_ at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN2L014CommandQueueHwIL14GFXCORE_FAMILY18EE19executeCommandListsEjPP25_ze_command_list_handle_tP18_ze_fence_handle_tbP18_ze_event_handle_tjPS9_ at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN2L033zeCommandQueueExecuteCommandListsEP26_ze_command_queue_handle_tjPP25_ze_command_list_handle_tP18_ze_fence_handle_t at /home/u222842/.julia/artifacts/f6b6f7783395fabf32b0337c23e95719f94b00fd/lib/libze_intel_gpu.so.1 (unknown line)
_ZN18ur_queue_handle_t_18executeCommandListENSt3__119__hash_map_iteratorINS0_15__hash_iteratorIPNS0_11__hash_nodeINS0_17__hash_value_typeIP25_ze_command_list_handle_t22ur_command_list_info_tEEPvEEEEEEbb at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libpi_level_zero.so (unknown line)
_ZN18ur_queue_handle_t_26executeAllOpenCommandListsEv at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libpi_level_zero.so (unknown line)
urQueueRelease at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libpi_level_zero.so (unknown line)
piQueueRelease at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libpi_level_zero.so (unknown line)
_ZNK4sycl3_V16detail6plugin12call_nocheckILNS1_9PiApiKindE26EJP9_pi_queueEEE10_pi_resultDpT0_ at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libsycl.so.7 (unknown line)
_ZN4sycl3_V16detail10queue_implD2Ev at /glob/development-tools/versions/oneapi/2024.1/oneapi/compiler/2024.1/lib/libsycl.so.7 (unknown line)
_M_release at /opt/x86_64-linux-gnu/x86_64-linux-gnu/include/c++/8.1.0/bits/shared_ptr_base.h:161 [inlined]
~__shared_count at /opt/x86_64-linux-gnu/x86_64-linux-gnu/include/c++/8.1.0/bits/shared_ptr_base.h:712 [inlined]
~__shared_ptr at /opt/x86_64-linux-gnu/x86_64-linux-gnu/include/c++/8.1.0/bits/shared_ptr_base.h:1151 [inlined]
~queue at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/sycl/queue.hpp:119 [inlined]
~syclQueue_st at /workspace/srcdir/oneAPI.jl/deps/src/sycl.hpp:19 [inlined]
syclQueueDestroy at /workspace/srcdir/oneAPI.jl/deps/src/sycl.cpp:60
syclQueueDestroy at /home/u222842/projects/oneAPI.jl/lib/support/liboneapi_support.jl:58 [inlined]
#7 at /home/u222842/projects/oneAPI.jl/lib/sycl/SYCL.jl:74
unknown function (ip: 0x7faf4714b085)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
run_finalizer at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:318
jl_gc_run_finalizers_in_list at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:408
run_finalizers at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:454
ijl_atexit_hook at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/init.c:299
jl_repl_entrypoint at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:732
main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x7faf5ed83d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 14926321 (Pool: 14908017; Big: 18304); GC: 23
Segmentation fault (core dumped)

I am using the intel devcloud for this and

pbsnodes | grep -B4 gpu
---
s019-n010
     state = job-exclusive
     power_state = Running
     np = 2
     properties = core,tgl,i9-11900kb,ram32gb,netgbe,gpu,gen11
---

This seems related to the issue I have been having with JuliaGPU/GPUArrays.jl#525

@maleadt
Copy link
Member

maleadt commented Jun 5, 2024

Which GPU? Are you using oneAPI.jl#master?

cc @pengtu

@leios
Copy link
Contributor Author

leios commented Jun 5, 2024

yes, I was on master. I was using an i9-11900kb, with:

julia> device()
ZeDevice(GPU, vendor 0x8086, device 0x9a60): Intel(R) UHD Graphics

As an interesting note, this error did not occur on another node with:

julia> device()
ZeDevice(GPU, vendor 0x8086, device 0x3e96): Intel(R) UHD Graphics P630

I'll investigate it some more tomorrow

@maleadt
Copy link
Member

maleadt commented Jun 5, 2024

Unless it's a simple double-free from the Julia side, I think this may be hard to investigate. There's been several related issue because of the MKL/Level0 interop, see e.g. #417 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants