Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when running GPU broadcast in a loop #327

Open
tiemvanderdeure opened this issue Jun 1, 2023 · 6 comments
Open

Memory leak when running GPU broadcast in a loop #327

tiemvanderdeure opened this issue Jun 1, 2023 · 6 comments
Labels
bug Something isn't working libraries Things about libraries and how we use them. performance Gotta go fast.

Comments

@tiemvanderdeure
Copy link

I need to run GPU operations inside a loop, where the output of one iteration is used in the next one.

However, even very simple GPU broadcasts result in a memory leak and eventually I get OutOfMemoryError().

I am using oneAPI v.1.2.2 on WSL2 with Ubuntu on a Windows 10.

A very simple example that reproduces this is:

using oneAPI

gpu_array = oneAPI.zeros(Float32, 10_000_000)

for j in 1:5_000
    gpu_array .+= 1
end

Is there something I am missing here?

I can see the GPU memory fill up in the Task Manager:
billede

@maleadt
Copy link
Member

maleadt commented Jun 5, 2023

Works for me, I never get an OOM and it doesn't exceed 3GB of memory use. Do note that Julia uses a GC, so it's expected to see memory usage rise quite a bit until it falls again.

Can you post a backtrace of an OOM?

@tiemvanderdeure
Copy link
Author

tiemvanderdeure commented Jun 5, 2023

When I run this, memory usage falls very little after running the loop, but stays high at around 7GB.

The loop actually completes without an error, but the OOM error arises when doing any other GPU operation afterwards.

So the above example code should have been (sorry!):

using oneAPI

gpu_array = oneAPI.zeros(Float32, 10_000_000)

for j in 1:5_000
    gpu_array .+= 1
end

gpu_array

Where the final line triggers the following error:

ERROR: OutOfMemoryError()
Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/oneAPI/Ogykr/lib/level-zero/libze.jl:16 [inlined]
  [2] zeCommandListAppendMemoryCopy(hCommandList::oneAPI.oneL0.ZeCommandList, dstptr::Ptr{Float32}, srcptr::oneAPI.oneL0.ZePtr{Float32}, size::Int64, hSignalEvent::Ptr{Nothing}, numWaitEvents::Int64, phWaitEvents::Vector{Any})
    @ oneAPI.oneL0 ~/.julia/packages/oneAPI/Ogykr/lib/utils/call.jl:24
  [3] append_copy! (repeats 2 times)
    @ ~/.julia/packages/oneAPI/Ogykr/lib/level-zero/copy.jl:7 [inlined]
  [4] #76
    @ ~/.julia/packages/oneAPI/Ogykr/src/memory.jl:8 [inlined]
  [5] oneAPI.oneL0.ZeCommandList(::oneAPI.var"#76#77"{Ptr{Float32}, oneAPI.oneL0.ZePtr{Float32}, Int64}, ::oneAPI.oneL0.ZeContext, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ oneAPI.oneL0 ~/.julia/packages/oneAPI/Ogykr/lib/level-zero/cmdlist.jl:45
  [6] ZeCommandList
    @ ~/.julia/packages/oneAPI/Ogykr/lib/level-zero/cmdlist.jl:43 [inlined]
  [7] #execute!#666
    @ ~/.julia/packages/oneAPI/Ogykr/lib/level-zero/cmdlist.jl:62 [inlined]
  [8] execute! (repeats 2 times)
    @ ~/.julia/packages/oneAPI/Ogykr/lib/level-zero/cmdlist.jl:61 [inlined]
  [9] unsafe_copyto!(ctx::oneAPI.oneL0.ZeContext, dev::oneAPI.oneL0.ZeDevice, dst::Ptr{Float32}, src::oneAPI.oneL0.ZePtr{Float32}, N::Int64)
    @ oneAPI ~/.julia/packages/oneAPI/Ogykr/src/memory.jl:7
 [10] unsafe_copyto!(ctx::oneAPI.oneL0.ZeContext, dev::oneAPI.oneL0.ZeDevice, dest::Vector{Float32}, doffs::Int64, src::oneVector{Float32, oneAPI.oneL0.DeviceBuffer}, soffs::Int64, n::Int64)
    @ oneAPI ~/.julia/packages/oneAPI/Ogykr/src/array.jl:315
 [11] copyto!
    @ ~/.julia/packages/oneAPI/Ogykr/src/array.jl:281 [inlined]
 [12] copyto!
    @ ~/.julia/packages/oneAPI/Ogykr/src/array.jl:285 [inlined]
 [13] copyto_axcheck!(dest::Vector{Float32}, src::oneVector{Float32, oneAPI.oneL0.DeviceBuffer})
    @ Base ./abstractarray.jl:1127
 [14] Array
    @ ./array.jl:626 [inlined]
 [15] Array
    @ ./boot.jl:483 [inlined]
 [16] convert
    @ ./array.jl:617 [inlined]
 [17] adapt_storage
    @ ~/.julia/packages/GPUArrays/TnEpb/src/host/abstractarray.jl:23 [inlined]
 [18] adapt_structure
    @ ~/.julia/packages/Adapt/UtItS/src/Adapt.jl:57 [inlined]
 [19] adapt
    @ ~/.julia/packages/Adapt/UtItS/src/Adapt.jl:40 [inlined]
 [20] print_array
    @ ~/.julia/packages/GPUArrays/TnEpb/src/host/abstractarray.jl:26 [inlined]
 [21] show(io::IOContext{Base.TTY}, #unused#::MIME{Symbol("text/plain")}, X::oneVector{Float32, oneAPI.oneL0.DeviceBuffer})
    @ Base ./arrayshow.jl:399
 [22] (::REPL.var"#43#44"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(io::Any)
    @ REPL /opt/julia-1.8.5/share/julia/stdlib/v1.8/REPL/src/REPL.jl:267
 [23] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL /opt/julia-1.8.5/share/julia/stdlib/v1.8/REPL/src/REPL.jl:521
 [24] display(d::REPL.REPLDisplay, mime::MIME{Symbol("text/plain")}, x::Any)
    @ REPL /opt/julia-1.8.5/share/julia/stdlib/v1.8/REPL/src/REPL.jl:260
 [25] display(d::REPL.REPLDisplay, x::Any)
    @ REPL /opt/julia-1.8.5/share/julia/stdlib/v1.8/REPL/src/REPL.jl:272
 [26] display(x::Any)
    @ Base.Multimedia ./multimedia.jl:328
 [27] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [28] invokelatest
    @ ./essentials.jl:726 [inlined]
 [29] (::VSCodeServer.var"#66#70"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/eval.jl:199
 [30] withpath(f::VSCodeServer.var"#66#70"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams}, path::String)
    @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/repl.jl:249
 [31] (::VSCodeServer.var"#65#69"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/eval.jl:155
 [32] hideprompt(f::VSCodeServer.var"#65#69"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})
    @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/repl.jl:38
 [33] (::VSCodeServer.var"#64#68"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/eval.jl:126
 [34] with_logstate(f::Function, logstate::Any)
    @ Base.CoreLogging ./logging.jl:511
 [35] with_logger
    @ ./logging.jl:623 [inlined]
 [36] (::VSCodeServer.var"#63#67"{VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/eval.jl:225
 [37] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [38] invokelatest(::Any)
    @ Base ./essentials.jl:726
 [39] macro expansion
    @ ~/.vscode-server/extensions/julialang.language-julia-1.47.2/scripts/packages/VSCodeServer/src/eval.jl:34 [inlined]
 [40] (::VSCodeServer.var"#61#62")()
    @ VSCodeServer ./task.jl:484

@maleadt
Copy link
Member

maleadt commented Jun 5, 2023

That still doesn't error here. What happens if you call GC.gc(true) a couple of times after the loop but before the subsequent operation that throws an OOM?

@tiemvanderdeure
Copy link
Author

I tried running GC.gc(true) once my memory is all filled up and that doesn't seem to do anything. It doesn't change memory use and neither does it solve the error, even if I call it multiple times and give it a few seconds every time.

Running GC.gc(true) in between loops seems to hold memory use way down, but it's also extremely slow. A middle solution where the garbage collector is called every so many loops works, though:

for i in 1:5
    for j in 1:1_000
        gpu_array .+= 1
    end

    GC.gc(true)
end

This keeps memery use at reasonable levels and prevents any OOM errors later. Memory use also doesn't depend on the number of outer loops anymore.

After some experimenting I found out calling sleep(10) instead of GC.gc(true) sometimes, but not always, has the same effect. I don't know if that can give any clues as to what is causing this issue.

@maleadt
Copy link
Member

maleadt commented Jun 13, 2023

That's all very weird. I can't reproduce this no matter how long I tried, or how many kernels I compiled and launched. Maybe this is related to Windows? I don't currently have WSL set-up, so won't be able to try this straight away.

If you have the time, you could consider running the code for a while and writing a heap snapshot, using the new 1.9 functionality: JuliaLang/julia#46862. You can then open the snapshot in Chrome. If it's a Julia object memory leak, we should be able to spot it there. If we're leaking GPU memory however, we'll have to annotate our alloc/free functions (which should be simple enough to add some accounting to) here in oneAPI.jl. But the fact that I can't reproduce makes me think that it's the driver that somehow keeps memory alive, which is going to be much harder to pinpoint.

@tiemvanderdeure
Copy link
Author

All right, I just ran both again (killing the REPL in between) and took a heap snapshots of both. I looked at them quickly and don't really see an obvious difference, but I don't really know what I am supposed to be looking for.

If this helps at all: I am on Windows 10, using the standard installation of WSL2 (Ubuntu). I use VS Code, but running it directly form Ubuntu doesn't change anything.

The files are too big to upload here, so I uploaded them here instead: https://file.io/xgRlcuitBSPP

@maleadt maleadt added bug Something isn't working libraries Things about libraries and how we use them. performance Gotta go fast. labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libraries Things about libraries and how we use them. performance Gotta go fast.
Projects
None yet
Development

No branches or pull requests

2 participants