Fix TensorStorage memory deallocation#145
Conversation
edgarriba
left a comment
There was a problem hiding this comment.
LGTM. Added some suggestions to make testing more robust
| // bytes_allocated value should not change. | ||
| { | ||
| let vec = Vec::<u8>::with_capacity(len); | ||
| let storage = TensorStorage::<u8, _>::from_vec(vec, allocator.clone()); |
There was a problem hiding this comment.
i think we need to resolve well here when we create a storage from a Vec how's the allocator involved
There was a problem hiding this comment.
Yes that would be good. This is my related understanding:
- Std Vector and
Buffer::from_vecdo not support custom allocators untilallocator_apicomes to stable rust. So, for our future custom allocators likeCudaAllocator,from_vecwill have to involve copying and useBuffer::from_custom_allocator. - For
CpuAllocator, we can have zero-copy usingBuffer::from_vec(as it is currently).
But to be fully safe, I think we should also changeCpuAllocatorto usestd::alloc::{alloc,dealloc}(Global allocator) instead ofstd::alloc::System.{alloc,dealloc}. This is because vector uses theGlobalallocator. This is usually the same asstd::alloc::Systembut the user can change it using the global_allocator attribute. By changingCpuAllocatorto usestd::alloc::{alloc,dealloc}, it always matches the allocator used by the vector (even when user changes it).
I'm not sure how to switch the implementation of TensorStorage::from_vec to one of the above based on whether A is CpuAllocator or CudaAllocator though. (maybe different functions? Ideas welcome!)
I haven't done any of the above in this PR though. Please let me know your thoughts and I can change accordingly.
There was a problem hiding this comment.
But to be fully safe, I think we should also change CpuAllocator to use std::alloc::{alloc,dealloc} (Global allocator) instead of std::alloc::System.{alloc,dealloc}
please, do 👍
I'm not sure how to switch the implementation of TensorStorage::from_vec to one of the above based on whether A is CpuAllocator or CudaAllocator though. (maybe different functions? Ideas welcome!)
maybe the behaviour for cuda should be that when a cuda storage is created via vec, the data is consumed, cuda allocated and copied to device, and deallocate the original cpu vector ? Haven't faced yet the full use case. Probably we should use the kornia::dnn module to try this workflows and prototype from there. Found a similar c++ implementation maybe to have a reference: https://gist.github.com/CommitThis/1666517de32893e5dc4c441269f1029a
There was a problem hiding this comment.
one more request in this direction, is the ability to easily create Image views. As Image is tuple struct out of Tensor: https://github.com/kornia/kornia-rs/blob/main/crates/kornia-image/src/image.rs#L59
In some workflows i have different types of images out of Tensor which i need to convert to Image::new (`as_slice().to_vec() everytime which involves copies) in order to use any kornia function. Unless we decide e.g to adapt the whole api to accept Tensor, and Image in the end it's just a trait in order to give some semantics and define specific types of images with formats e.g Rgb8U, Mono8U.
There was a problem hiding this comment.
I didn't quite understand the image view part. But for converting Tensor3 to Image without copy, we could implement the TryFrom trait or a function from_tensor that uses the passed tensor if channel dimension matches.
There was a problem hiding this comment.
Image in the end is a Tensor3 — would be a bit overkill to have a from_tensor method. I think ideally for this case we might want to have a method that somehow transfers the ownership of the storage, shape and strides ?
There was a problem hiding this comment.
Just to make myself clear, this is what I had in mind:
impl<T, const C: usize> TryFrom<Tensor<T, 3, CpuAllocator>> for Image<T, C>
where
T: SafeTensorType,
{
type Error = ImageError;
fn try_from(value: Tensor<T, 3, CpuAllocator>) -> Result<Self, Self::Error> {
if value.shape[2] == C {
Ok(Self(value))
} else {
Err(ImageError::InvalidTensorShape)
}
}
}Used like:
let image: Image<_, 3> = tensor.try_into().unwrap();
// OR
let image = Image::<_, 3>::try_from(tensor).unwrap();There was a problem hiding this comment.
Oh, I see, sounds good! I’ll try myself in a separated PR. I have also some potential improvements for the image struct, like adding a third ImageColorSpace in order to define more specific types like
type ImageRgb8 = Image<u8, 3, ColorSpace::Rgb> which color will have associated values too
There was a problem hiding this comment.
Regarding this PR, I just want to note that it only fixes the deallocation issues noted in the first comment. It does not currently include the changes related to into_vec for non-CpuAllocators as discussed above in this review thread. I had started it, but it is not ready yet. I can send a separate PR when it's ready (hope that's ok).
| // Deallocation should happen when `storage` goes out of scope. | ||
| { | ||
| let _storage = TensorStorage::<u8, _>::new(len, allocator.clone())?; | ||
| assert_eq!(*allocator.bytes_allocated.borrow(), len as i32); |
There was a problem hiding this comment.
In this case the storage is not created from another buffer so there is no original pointer to compare to. Are there any other checks I should have added?
* Add failing tests for TensorAllocator * Fix tensor memory leaks * Use NonNull instead of raw pointer * Return result from test and use ? operator * Add ptr sanity checks in tests * Use Global allocator in CpuAllocator * Wrap alloc in an Arc<>
Changes in this PR
remap#134 (comment) and Refactor hflip, vflip functions to allow preallocation #117 (comment).Details
A new struct
TensorCustomAllocationOwneris added to represent a custom memory allocations made by TensorAllocators. This struct implements theDroptrait to deallocate the associated memory. An instance of this struct is then passed as owner toarrow_buffer::Buffer::from_custom_allocation, so that when the Arrow buffer is dropped, this memory will also be deallocated using the correct TensorAllocator.Future changes
When the
allocator_apifeature comes to stable rust, it may be possible to use a standard container such asVec<T, A=..>instead of the above mentioned struct to handle custom deallocation.