-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of experimental.resize #5662
base: main
Are you sure you want to change the base?
Improve performance of experimental.resize #5662
Conversation
Signed-off-by: Rafal Banas <[email protected]>
!build |
CI MESSAGE: [19099938]: BUILD STARTED |
TensorList<GPUBackend> in_frames_; | ||
TensorList<GPUBackend> out_frames_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually help? How much? It's against the trend of more aggressive dynamic allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It helped A LOT. For some reason the destructors of these were taking a lot of time.
Anyway, I'm working on removing those auxiliary TensorLists completely, because the .ShareData and .Resize also take significant amount of time.
CI MESSAGE: [19099938]: BUILD FAILED |
168b119
to
971ba9a
Compare
Signed-off-by: Rafal Banas <[email protected]>
971ba9a
to
44b9ca2
Compare
!build |
CI MESSAGE: [19870509]: BUILD FAILED |
!build |
CI MESSAGE: [19871581]: BUILD STARTED |
CI MESSAGE: [19871581]: BUILD PASSED |
@@ -23,6 +23,7 @@ | |||
#include "dali/kernels/imgproc/resample/params.h" | |||
#include "dali/operators/image/resize/resize_op_impl.h" | |||
#include "dali/operators/nvcvop/nvcvop.h" | |||
#include "dali/core/nvtx.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused?
for (int64_t i = 0; i < num_frames; ++i) { | ||
if (frame_offset == sample_nframes) { | ||
frame_offset = 0; | ||
do { | ||
++sample_id; | ||
auto sample_shape = input_shape[sample_id]; | ||
DALI_ENFORCE(sample_id < t_list.num_samples()); | ||
std::copy(&sample_shape[first_spatial_dim], &sample_shape[input_shape.sample_dim()], | ||
frame_shape.begin()); | ||
frame_stride = volume(frame_shape) * type_size; | ||
sample_nframes = calc_num_frames(sample_shape, first_spatial_dim); | ||
} while (sample_nframes * frame_stride == 0); // we skip empty samples | ||
data = | ||
static_cast<const uint8_t *>(t_list.raw_tensor(sample_id)) + frame_stride * frame_offset; | ||
} | ||
tensors.push_back(AsTensor(data, make_span(frame_shape), dtype, nvcv_layout)); | ||
data += frame_stride; | ||
frame_offset++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that combining the two loops and changing the outer loop condition to "while there are frames left to insert" makes it more readable:
for (int64_t i = 0; i < num_frames; ++i) { | |
if (frame_offset == sample_nframes) { | |
frame_offset = 0; | |
do { | |
++sample_id; | |
auto sample_shape = input_shape[sample_id]; | |
DALI_ENFORCE(sample_id < t_list.num_samples()); | |
std::copy(&sample_shape[first_spatial_dim], &sample_shape[input_shape.sample_dim()], | |
frame_shape.begin()); | |
frame_stride = volume(frame_shape) * type_size; | |
sample_nframes = calc_num_frames(sample_shape, first_spatial_dim); | |
} while (sample_nframes * frame_stride == 0); // we skip empty samples | |
data = | |
static_cast<const uint8_t *>(t_list.raw_tensor(sample_id)) + frame_stride * frame_offset; | |
} | |
tensors.push_back(AsTensor(data, make_span(frame_shape), dtype, nvcv_layout)); | |
data += frame_stride; | |
frame_offset++; | |
int frames_left = num_frames; | |
while (frames_left) { | |
if (frame >= sample_nframes) { | |
++sample_id; | |
assert(sample_id < t_list.num_samples()); | |
auto sample_shape = input_shape[sample_id]; | |
std::copy(&sample_shape[first_spatial_dim], &sample_shape[input_shape.sample_dim()], | |
frame_shape.begin()); | |
frame_stride = volume(frame_shape) * type_size; | |
if (frame_stride == 0) { // this sample is (effectively) empty - skip | |
sample_nframes = 0; | |
continue; | |
} | |
sample_nframes = calc_num_frames(sample_shape, first_spatial_dim); | |
data = static_cast<const uint8_t *>(t_list.raw_tensor(sample_id)) + frame_stride * frame_offset; | |
} | |
tensors.push_back(AsTensor(data, make_span(frame_shape), dtype, nvcv_layout)); | |
data += frame_stride; | |
frame_offset++; | |
frames_left--; | |
} |
do { | ||
++sample_id; | ||
auto sample_shape = input_shape[sample_id]; | ||
DALI_ENFORCE(sample_id < t_list.num_samples()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that either user or faulty data could trigger this - it would be an internal error, so I'd recommend using an assert
or throwing logic_error
at worst.
nvcvop::PushFramesToBatch(mb_input, input, first_spatial_dim_, mb.sample_offset, | ||
mb.frame_offset, mb.count, sample_layout_); | ||
nvcvop::PushFramesToBatch(mb_output, output, first_spatial_dim_, mb.sample_offset, | ||
mb.frame_offset, mb.count, sample_layout_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug. Both inputs and outputs should be inserted in one go and skipping the empty samples should be based solely on the output size. The user may request resizing a non-empty tensor to (0, 0), which is not an error AFAIR. Resizing an empty input to non-empty shape is an error and should be thrown at some point.
if (volume(in_sample_shape) > 0) | ||
total_frames += volume(&in_sample_shape[0], &in_sample_shape[first_spatial_dim]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug - the emptiness of a frame depends on the output shape, not input. At least in old DALI resize, you can resize a non-empty frame to size 0. I understand that such samples should be skipped (both at input and output).
Resizing an empty frame to a non-zero shape is impossible and should throw.
Category:
Refactoring
Description:
This PR improves experimental.resize operator to reduce the CPU overhead of the operator. It contains the following improvements (from most to least significant):
Additional information:
Affected modules and functionalities:
experimental.resize, nvcvop.h/cc
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A