Fix TensorStorage memory deallocation by emilmgeorge · Pull Request #145 · kornia/kornia-rs

emilmgeorge · 2024-09-23T16:41:46Z

Changes in this PR

Adds checks to make sure the TensorAllocator is invoked properly.
Fixes memory leaks/issues observed in Memory leak in remap #134 (comment) and Refactor hflip, vflip functions to allow preallocation #117 (comment).

Details
A new struct TensorCustomAllocationOwner is added to represent a custom memory allocations made by TensorAllocators. This struct implements the Drop trait to deallocate the associated memory. An instance of this struct is then passed as owner to arrow_buffer::Buffer::from_custom_allocation, so that when the Arrow buffer is dropped, this memory will also be deallocated using the correct TensorAllocator.

Future changes
When the allocator_api feature comes to stable rust, it may be possible to use a standard container such as Vec<T, A=..> instead of the above mentioned struct to handle custom deallocation.

edgarriba

LGTM. Added some suggestions to make testing more robust

edgarriba · 2024-09-24T09:01:51Z

+        // bytes_allocated value should not change.
+        {
+            let vec = Vec::<u8>::with_capacity(len);
+            let storage = TensorStorage::<u8, _>::from_vec(vec, allocator.clone());


i think we need to resolve well here when we create a storage from a Vec how's the allocator involved

worth to check this apache/arrow-rs#6362

Yes that would be good. This is my related understanding:

Std Vector and Buffer::from_vec do not support custom allocators until allocator_api comes to stable rust. So, for our future custom allocators like CudaAllocator, from_vec will have to involve copying and use Buffer::from_custom_allocator.

For CpuAllocator, we can have zero-copy using Buffer::from_vec (as it is currently).
But to be fully safe, I think we should also change CpuAllocator to use std::alloc::{alloc,dealloc} (Global allocator) instead of std::alloc::System.{alloc,dealloc}. This is because vector uses the Global allocator. This is usually the same as std::alloc::System but the user can change it using the global_allocator attribute. By changing CpuAllocator to use std::alloc::{alloc,dealloc}, it always matches the allocator used by the vector (even when user changes it).

I'm not sure how to switch the implementation of TensorStorage::from_vec to one of the above based on whether A is CpuAllocator or CudaAllocator though. (maybe different functions? Ideas welcome!)

I haven't done any of the above in this PR though. Please let me know your thoughts and I can change accordingly.

But to be fully safe, I think we should also change CpuAllocator to use std::alloc::{alloc,dealloc} (Global allocator) instead of std::alloc::System.{alloc,dealloc}

please, do 👍

I'm not sure how to switch the implementation of TensorStorage::from_vec to one of the above based on whether A is CpuAllocator or CudaAllocator though. (maybe different functions? Ideas welcome!)

maybe the behaviour for cuda should be that when a cuda storage is created via vec, the data is consumed, cuda allocated and copied to device, and deallocate the original cpu vector ? Haven't faced yet the full use case. Probably we should use the kornia::dnn module to try this workflows and prototype from there. Found a similar c++ implementation maybe to have a reference: https://gist.github.com/CommitThis/1666517de32893e5dc4c441269f1029a

one more request in this direction, is the ability to easily create Image views. As Image is tuple struct out of Tensor: https://github.com/kornia/kornia-rs/blob/main/crates/kornia-image/src/image.rs#L59

In some workflows i have different types of images out of Tensor which i need to convert to Image::new (`as_slice().to_vec() everytime which involves copies) in order to use any kornia function. Unless we decide e.g to adapt the whole api to accept Tensor, and Image in the end it's just a trait in order to give some semantics and define specific types of images with formats e.g Rgb8U, Mono8U.

I didn't quite understand the image view part. But for converting Tensor3 to Image without copy, we could implement the TryFrom trait or a function from_tensor that uses the passed tensor if channel dimension matches.

Image in the end is a Tensor3 — would be a bit overkill to have a from_tensor method. I think ideally for this case we might want to have a method that somehow transfers the ownership of the storage, shape and strides ?

Just to make myself clear, this is what I had in mind:

impl<T, const C: usize> TryFrom<Tensor<T, 3, CpuAllocator>> for Image<T, C> where T: SafeTensorType, { type Error = ImageError; fn try_from(value: Tensor<T, 3, CpuAllocator>) -> Result<Self, Self::Error> { if value.shape[2] == C { Ok(Self(value)) } else { Err(ImageError::InvalidTensorShape) } } }

Used like:

let image: Image<_, 3> = tensor.try_into().unwrap(); // OR let image = Image::<_, 3>::try_from(tensor).unwrap();

Oh, I see, sounds good! I’ll try myself in a separated PR. I have also some potential improvements for the image struct, like adding a third ImageColorSpace in order to define more specific types like
type ImageRgb8 = Image<u8, 3, ColorSpace::Rgb> which color will have associated values too

Regarding this PR, I just want to note that it only fixes the deallocation issues noted in the first comment. It does not currently include the changes related to into_vec for non-CpuAllocators as discussed above in this review thread. I had started it, but it is not ready yet. I can send a separate PR when it's ready (hope that's ok).

edgarriba

LGTM

edgarriba · 2024-09-28T05:30:39Z

+        // Deallocation should happen when `storage` goes out of scope.
+        {
+            let _storage = TensorStorage::<u8, _>::new(len, allocator.clone())?;
+            assert_eq!(*allocator.bytes_allocated.borrow(), len as i32);


No pointer test here ?

In this case the storage is not created from another buffer so there is no original pointer to compare to. Are there any other checks I should have added?

* Add failing tests for TensorAllocator * Fix tensor memory leaks * Use NonNull instead of raw pointer * Return result from test and use ? operator * Add ptr sanity checks in tests * Use Global allocator in CpuAllocator * Wrap alloc in an Arc<>

emilmgeorge added 2 commits September 23, 2024 21:42

Add failing tests for TensorAllocator

8026545

Fix tensor memory leaks

90e8c6e

edgarriba reviewed Sep 24, 2024

View reviewed changes

emilmgeorge added 6 commits September 26, 2024 01:02

Use NonNull instead of raw pointer

ef8246a

Return result from test and use ? operator

7ed404f

Add ptr sanity checks in tests

9458c60

Use Global allocator in CpuAllocator

27ddb67

Wrap alloc in an Arc<>

46db49f

Merge branch 'main' into bugfix/mem-leak

db298a7

edgarriba approved these changes Sep 28, 2024

View reviewed changes

edgarriba merged commit eeabb12 into kornia:main Sep 28, 2024

emilmgeorge deleted the bugfix/mem-leak branch September 28, 2024 13:01

edgarriba mentioned this pull request Mar 9, 2025

Memory leak in remap #134

Closed

Uh oh!

Conversation

emilmgeorge commented Sep 23, 2024

Uh oh!

edgarriba left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edgarriba Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

edgarriba left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

edgarriba Sep 26, 2024 •

edited

Loading