-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TensorStorage memory deallocation #145
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Added some suggestions to make testing more robust
crates/kornia-core/src/storage.rs
Outdated
// bytes_allocated value should not change. | ||
{ | ||
let vec = Vec::<u8>::with_capacity(len); | ||
let storage = TensorStorage::<u8, _>::from_vec(vec, allocator.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we need to resolve well here when we create a storage from a Vec how's the allocator involved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth to check this apache/arrow-rs#6362
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would be good. This is my related understanding:
- Std Vector and
Buffer::from_vec
do not support custom allocators untilallocator_api
comes to stable rust. So, for our future custom allocators likeCudaAllocator
,from_vec
will have to involve copying and useBuffer::from_custom_allocator
. - For
CpuAllocator
, we can have zero-copy usingBuffer::from_vec
(as it is currently).
But to be fully safe, I think we should also changeCpuAllocator
to usestd::alloc::{alloc,dealloc}
(Global allocator) instead ofstd::alloc::System.{alloc,dealloc}
. This is because vector uses theGlobal
allocator. This is usually the same asstd::alloc::System
but the user can change it using the global_allocator attribute. By changingCpuAllocator
to usestd::alloc::{alloc,dealloc}
, it always matches the allocator used by the vector (even when user changes it).
I'm not sure how to switch the implementation of TensorStorage::from_vec
to one of the above based on whether A
is CpuAllocator
or CudaAllocator
though. (maybe different functions? Ideas welcome!)
I haven't done any of the above in this PR though. Please let me know your thoughts and I can change accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But to be fully safe, I think we should also change CpuAllocator to use std::alloc::{alloc,dealloc} (Global allocator) instead of std::alloc::System.{alloc,dealloc}
please, do 👍
I'm not sure how to switch the implementation of TensorStorage::from_vec to one of the above based on whether A is CpuAllocator or CudaAllocator though. (maybe different functions? Ideas welcome!)
maybe the behaviour for cuda should be that when a cuda storage is created via vec, the data is consumed, cuda allocated and copied to device, and deallocate the original cpu vector ? Haven't faced yet the full use case. Probably we should use the kornia::dnn
module to try this workflows and prototype from there. Found a similar c++ implementation maybe to have a reference: https://gist.github.com/CommitThis/1666517de32893e5dc4c441269f1029a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more request in this direction, is the ability to easily create Image views. As Image is tuple struct out of Tensor: https://github.com/kornia/kornia-rs/blob/main/crates/kornia-image/src/image.rs#L59
In some workflows i have different types of images out of Tensor which i need to convert to Image::new (`as_slice().to_vec() everytime which involves copies) in order to use any kornia function. Unless we decide e.g to adapt the whole api to accept Tensor, and Image in the end it's just a trait in order to give some semantics and define specific types of images with formats e.g Rgb8U, Mono8U.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't quite understand the image view part. But for converting Tensor3 to Image without copy, we could implement the TryFrom
trait or a function from_tensor
that uses the passed tensor if channel dimension matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Image in the end is a Tensor3 — would be a bit overkill to have a from_tensor method. I think ideally for this case we might want to have a method that somehow transfers the ownership of the storage, shape and strides ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make myself clear, this is what I had in mind:
impl<T, const C: usize> TryFrom<Tensor<T, 3, CpuAllocator>> for Image<T, C>
where
T: SafeTensorType,
{
type Error = ImageError;
fn try_from(value: Tensor<T, 3, CpuAllocator>) -> Result<Self, Self::Error> {
if value.shape[2] == C {
Ok(Self(value))
} else {
Err(ImageError::InvalidTensorShape)
}
}
}
Used like:
let image: Image<_, 3> = tensor.try_into().unwrap();
// OR
let image = Image::<_, 3>::try_from(tensor).unwrap();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, sounds good! I’ll try myself in a separated PR. I have also some potential improvements for the image struct, like adding a third ImageColorSpace in order to define more specific types like
type ImageRgb8 = Image<u8, 3, ColorSpace::Rgb>
which color will have associated values too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this PR, I just want to note that it only fixes the deallocation issues noted in the first comment. It does not currently include the changes related to into_vec
for non-CpuAllocators as discussed above in this review thread. I had started it, but it is not ready yet. I can send a separate PR when it's ready (hope that's ok).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// Deallocation should happen when `storage` goes out of scope. | ||
{ | ||
let _storage = TensorStorage::<u8, _>::new(len, allocator.clone())?; | ||
assert_eq!(*allocator.bytes_allocated.borrow(), len as i32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No pointer test here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the storage is not created from another buffer so there is no original pointer to compare to. Are there any other checks I should have added?
Changes in this PR
remap
#134 (comment) and Refactor hflip, vflip functions to allow preallocation #117 (comment).Details
A new struct
TensorCustomAllocationOwner
is added to represent a custom memory allocations made by TensorAllocators. This struct implements theDrop
trait to deallocate the associated memory. An instance of this struct is then passed as owner toarrow_buffer::Buffer::from_custom_allocation
, so that when the Arrow buffer is dropped, this memory will also be deallocated using the correct TensorAllocator.Future changes
When the
allocator_api
feature comes to stable rust, it may be possible to use a standard container such asVec<T, A=..>
instead of the above mentioned struct to handle custom deallocation.