Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nanobind leak warnings #758

Open
matthiasdiener opened this issue May 17, 2024 · 9 comments · Fixed by #759
Open

Nanobind leak warnings #758

matthiasdiener opened this issue May 17, 2024 · 9 comments · Fixed by #759
Labels

Comments

@matthiasdiener
Copy link
Contributor

matthiasdiener commented May 17, 2024

Describe the bug
At the end of execution, what appears to be SVM-related pyopencl code causes nanobind leak warnings to appear:

nanobind: leaked 3 instances!
 - leaked instance 0x1010fdf88 of type "SVMAllocator"
 - leaked instance 0x1076bd3c8 of type "SVMPool"
 - leaked instance 0x101459258 of type "Context"
nanobind: leaked 3 types!
 - leaked type "pyopencl._cl.SVMAllocator"
 - leaked type "pyopencl._cl.Context"
 - leaked type "pyopencl._cl.SVMPool"
nanobind: leaked 20 functions!
 - leaked function ""
 - leaked function ""
 - leaked function ""
 - leaked function "__eq__"
 - leaked function ""
 - leaked function "alloc_size"
 - leaked function "free_held"
 - leaked function "__init__"
 - leaked function ""
 - leaked function "__call__"
 - leaked function "__init__"
 - ... skipped remainder
nanobind: this is likely caused by a reference counting issue in the binding code.

To Reproduce

  1. $ cd pyopencl
  2. $ python examples/demo_array_svm.py (causes leak warnings)
  3. $ python examples/demo_array.py (does not cause leak warnings)

Edit:
Easiest way to reproduce:

$ python -c 'import pyopencl as cl; ctx = cl.create_some_context(); f=cl.SVMAllocation(ctx, 10, 0, 0)'
nanobind: leaked 1 instances!
 - leaked instance 0x106203858 of type "Context"
nanobind: leaked 1 types!
 - leaked type "pyopencl._cl.Context"
nanobind: leaked 7 functions!
 - leaked function "__eq__"
 - leaked function "from_int_ptr"
 - leaked function "set_default_device_command_queue"
 - leaked function "__hash__"
 - leaked function ""
 - leaked function "get_info"
 - leaked function "__init__"
nanobind: this is likely caused by a reference counting issue in the binding code.

(when not assigning to f, the warnings do not appear)

Expected behavior
Don't show the warnings.

Environment (please complete the following information):

  • OS: MacOS
  • Python version: 3.11.9
  • PyOpenCL version: 2024.2.2

Additional context
Note that #755 does not address this issue.

@inducer
Copy link
Owner

inducer commented May 17, 2024

Thanks for the small reproducer, that helps! I think part of the problem here is that SVMAllocation holds a reference to the CL context in a shared_ptr:

std::shared_ptr<context> m_context;

It needs to hold a reference, because it needs the context to free the SVM:

clSVMFree(m_context->data(), m_allocation);

Nanobind's support for shared_ptr increases the context's reference count, and the behavior is overall quite complex (see part 1, part 2). While I feel that the context should still get GC'd (especially after the SVMAllocation is gone, which it must be, because it isn't getting leaked), that somehow doesn't seem to be happening. I'm not sure I have a theory on the precise mechanism.

As an alternative to the complexity of shared ownership between C++ and Python, nanobind also has intrusive reference counting. Maybe that's worth a try, for all the cases where currently shared_ptrs are being used.

@inducer
Copy link
Owner

inducer commented May 17, 2024

Actually, we may be dealing with what's described as the "uncollectable inter-language reference cycle" in nanobind's docs for intrusive RC.

@inducer
Copy link
Owner

inducer commented May 17, 2024

I feel like we should move to intrusive refcounting for context, queue, memory pools, and allocators. I'm out of time for today, but if you have time to get this started, I'd be open to looking at it.

@inducer
Copy link
Owner

inducer commented May 19, 2024

#759 is a start that already addresses your small reproducer.

@matthiasdiener
Copy link
Contributor Author

There still appear to be leak warnings (see e.g. https://github.com/inducer/pyopencl/actions/runs/9154012159/job/25163854069), but I can't reproduce them locally at the moment.

@inducer inducer changed the title SVM-related (?) pyopencl code causes nanobind leak warnings Nanobind leak warnings May 20, 2024
@inducer inducer reopened this May 20, 2024
@inducer
Copy link
Owner

inducer commented May 20, 2024

Thanks for spotting that! For posterity:

 nanobind: leaked 7 instances!
 - leaked instance 0x55eb67f303f0 of type "pyopencl._cl.Device"
 - leaked instance 0x55eb67f697e0 of type "pyopencl._cl.Device"
 - leaked instance 0x55eb680ec850 of type "pyopencl._cl.Device"
 - leaked instance 0x55eb67edd4b0 of type "pyopencl._cl.Device"
 - leaked instance 0x55eb67929de0 of type "pyopencl._cl.Platform"
 - leaked instance 0x55eb67c9e200 of type "pyopencl._cl.Device"
 - leaked instance 0x55eb6c9cbe50 of type "pyopencl._cl.Platform"
nanobind: leaked 18 types!
 - leaked type "pyopencl._cl.Device"
 - leaked type "pyopencl._cl.SVMAllocator"
 - leaked type "pyopencl._cl.SVMPointer"
 - leaked type "pyopencl._cl.CommandQueue"
 - leaked type "pyopencl._cl.PooledBuffer"
 - leaked type "pyopencl._cl.ImmediateAllocator"
 - leaked type "pyopencl._cl.AllocatorBase"
 - leaked type "pyopencl._cl.Event"
 - leaked type "pyopencl._cl.Buffer"
 - leaked type "pyopencl._cl.Kernel"
 - leaked type "pyopencl._cl.SVMPool"
 - ... skipped remainder
nanobind: leaked 109 functions!
 - leaked function "__init__"
 - leaked function ""
 - leaked function "__call__"
 - leaked function "__call__"
 - leaked function "unbind_from_queue"
 - leaked function "__hash__"
 - leaked function "_set_arg_svm"
 - leaked function "from_int_ptr"
 - leaked function ""
 - leaked function "enqueue_release"
 - leaked function "_set_arg_buf_pack_multi"
 - ... skipped remainder
nanobind: this is likely caused by a reference counting issue in the binding code.

We should track those down sometime.

@vincefn
Copy link

vincefn commented May 31, 2024

Could we disable these messages, e.g. by giving access to [nb::set_leak_warnings()] ?(https://nanobind.readthedocs.io/en/latest/faq.html#why-am-i-getting-errors-about-leaked-functions-and-types)

@inducer
Copy link
Owner

inducer commented May 31, 2024

@vincefn The incidence of these should be substantially reduced post b659cad (unreleased). It might be good to disable these for release builds by default, because they don't have much utility from the user's end.

@matthiasdiener
Copy link
Contributor Author

One way to reproduce the warnings after b659cad is:

$ python examples/demo-struct-reduce.py
nanobind: leaked 1 instances!
 - leaked instance 0xaaaadcaedf00 of type "pyopencl._cl.Device"
nanobind: leaked 17 types!
 - leaked type "pyopencl._cl.Device"
 - leaked type "pyopencl._cl.SVMAllocator"
 - leaked type "pyopencl._cl.SVMPointer"
 - leaked type "pyopencl._cl.CommandQueue"
 - leaked type "pyopencl._cl.PooledBuffer"
 - leaked type "pyopencl._cl.ImmediateAllocator"
 - leaked type "pyopencl._cl.AllocatorBase"
 - leaked type "pyopencl._cl.Event"
 - leaked type "pyopencl._cl.Buffer"
 - leaked type "pyopencl._cl.Kernel"
 - leaked type "pyopencl._cl.SVMPool"
 - ... skipped remainder
nanobind: leaked 103 functions!
 - leaked function "device_and_host_timer"
 - leaked function "get_cl_header_version"
 - leaked function "from_int_ptr"
 - leaked function "_set_arg_multi"
 - leaked function ""
 - leaked function ""
 - leaked function "set_arg"
 - leaked function "__init__"
 - leaked function "from_int_ptr"
 - leaked function "get_host_array"
 - leaked function "get_work_group_info"
 - ... skipped remainder
nanobind: this is likely caused by a reference counting issue in the binding code.

For some reason, these do not show up on my Mac, just on Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants