Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practice for video ingestion #88

Open
SiftingSands opened this issue Jul 21, 2021 · 20 comments
Open

Best practice for video ingestion #88

SiftingSands opened this issue Jul 21, 2021 · 20 comments
Assignees
Labels
enhancement New feature or request

Comments

@SiftingSands
Copy link

With DALI, outside of Triton, I would call fn.readers.video with a list of video filepaths. However, is that still feasible within Triton, because all the examples I see here are passing numpy arrays to ExternalSource. Was there a example with video decoding available that I missed? My current use case isn't streaming video, so I have videos 'at rest'.

@szalpal
Copy link
Member

szalpal commented Jul 23, 2021

Hello @SiftingSands !

We do not have any example with video yet, but I do agree, working with videos right now via Triton is cumbersome. Reading and decoding video in DALI pipeline is different from other types of data, since both these operations happen at single step. fn.readers.video assumes, that the video exists in a file and user provides path to this file. Therefore, it's rather infeasible to use it canonically with Triton, where we send a request with the data.

You are the first to ask about videos support, we didn't focus on this yet, but we'll put it on our priority list. We need to think about how we can enable this feature for DALI Backend. I'll be posting updates on the status of this feature request in this Issue.

cheers!

@szalpal szalpal added the enhancement New feature or request label Jul 23, 2021
@SiftingSands
Copy link
Author

Thanks for confirming the status of video decoding with Triton. Glad I asked before I tried something like decoding a video into a bit stream on the client and passing that over to the server. Looking forward to future developments!

@szalpal
Copy link
Member

szalpal commented Jul 26, 2021

@SiftingSands ,

I don't precisely know, what is your use case. However, have you considered using DeepStream? It is a AI powered video analysis library and it is possible to use it via Triton

@SiftingSands
Copy link
Author

I did see DeepStream with Triton as an option, but I had already used DALI for my inference use case, so I didn't want to switch to another solution if unnecessary. My videos are 'at rest', so I also wasn't sure if DeepStream would be a suitable route. Regardless, I'm not currently in a huge rush for Triton integration.

@Tandon-A
Copy link

Tandon-A commented Feb 5, 2022

@szalpal Is there any update on this?

@szalpal
Copy link
Member

szalpal commented Feb 7, 2022

Hi @Tandon-A !

Unfortunately not yet. Currently the only way of decoding videos in DALI is to read them from disk, which is not the desired scenario in Triton case. As soon as we add proper enhancements in DALI, we'll add an example to DALI Backend.

@nitsanh
Copy link

nitsanh commented Jul 7, 2022

Hello @SiftingSands !

We do not have any example with video yet, but I do agree, working with videos right now via Triton is cumbersome. Reading and decoding video in DALI pipeline is different from other types of data, since both these operations happen at single step. fn.readers.video assumes, that the video exists in a file and user provides path to this file. Therefore, it's rather infeasible to use it canonically with Triton, where we send a request with the data.

You are the first to ask about videos support, we didn't focus on this yet, but we'll put it on our priority list. We need to think about how we can enable this feature for DALI Backend. I'll be posting updates on the status of this feature request in this Issue.

cheers!

Hi, I have a similar use case, where I have videos saved as files, and I want the user to send me the path to one of them. I was trying to receive the path using video_path = dali.fn.external_source(device="cpu", name="DALI_INPUT_VIDEO_PATH", dtype=types.STRING), and then give that path to fn.readers.video with no success.

Is this use case supposed to work? It looks like the Dali backend refuses to receive a string input, is that really the case?

If this is impossible, would it be possible to replace the video file each call and have the Dali pipeline work with one fixed path (I guess in that case it wouldn't need to receive any inputs at all)?

@szalpal
Copy link
Member

szalpal commented Jul 8, 2022

Hi @nitsanh !

Unfortunately, your use case won't work, because of few reasons. First of all, external_source and DALI Backend currently do not work with TYPE_STRING. We do have this on our radar and you may expect support for TYPE_STRING there in the next few releases.

But most importantly, DALI's fn.readers are currently not suitable for Triton usage and this cannot be worked around easily. The reason is that the file paths passed to every fn.reader operator is not a runtime-argument, but a build-time-argument. That means, these arguments need to be specified at pipeline build time.

The solution you proposed, that fn.readers.video would have a hardcoded path and the contents under this path would change - this might work. However this may lead to a lot of different problems, so I would discourage such solution in a production environment.

Anyway, we do plan extend the support for videos, as we are aware that currently video support in Triton+DALI is virtually non-existing.

@nitsanh
Copy link

nitsanh commented Jul 8, 2022

Hi @nitsanh !

Unfortunately, your use case won't work, because of few reasons. First of all, external_source and DALI Backend currently do not work with TYPE_STRING. We do have this on our radar and you may expect support for TYPE_STRING there in the next few releases.

But most importantly, DALI's fn.readers are currently not suitable for Triton usage and this cannot be worked around easily. The reason is that the file paths passed to every fn.reader operator is not a runtime-argument, but a build-time-argument. That means, these arguments need to be specified at pipeline build time.

The solution you proposed, that fn.readers.video would have a hardcoded path and the contents under this path would change - this might work. However this may lead to a lot of different problems, so I would discourage such solution in a production environment.

Anyway, we do plan extend the support for videos, as we are aware that currently video support in Triton+DALI is virtually non-existing.

Thanks for your reply, I'll try using DeepStream or VPF then.

@szalpal
Copy link
Member

szalpal commented Jul 11, 2022

@SiftingSands ,
@nitsanh ,
@Tandon-A ,

would you mind sharing with us some more insides into your use-cases? We are starting to work on video support in DALI Backend and we'd like to know some possible requirements.

Initial plan is to support the video data, provided that whatever is passed to the input of the DALI pipeline is an entire video file in binary buffer. Would this suffice your needs?

@nitsanh
Copy link

nitsanh commented Jul 11, 2022

@SiftingSands , @nitsanh , @Tandon-A ,

would you mind sharing with us some more insides into your use-cases? We are starting to work on video support in DALI Backend and we'd like to know some possible requirements.

Initial plan is to support the video data, provided that whatever is passed to the input of the DALI pipeline is an entire video file in binary buffer. Would this suffice your needs?

My use case is this:
I have a model that is performing some kind of a manipulation on a video that a user sends. It uses the entire video file in one go, so I should supply the model with the full length video (up to 30~ seconds per video). I wanted to have an ensemble in Triton where the first block is the Dali backend, that receives a video file path, reads it and resizes it, then passes the resized video tensor to the next ensemble block, which is the model itself (that uses the onnx / pytorch backend). I would have liked Dali to place the preprocessed video tensor on the GPU so the model can read it from there.

I also needed a way to get that entire video file from the user, and then back to them. I’m not sure what is the proper way to do that, I thought maybe having another service in the same machine as Triton, that interacts with the user and downloads / uploads the video files from a bucket in the cloud.

My main issue was that Dali doesn’t support reading a video from an arbitrary file. I think the best approach would have been to pass Dali with a string representing the video path in the local file system, then use the video reader to read it.

I think passing the entire video file in binary buffer wouldn't work well for large videos, would it?

@SiftingSands
Copy link
Author

@SiftingSands , @nitsanh , @Tandon-A ,

would you mind sharing with us some more insides into your use-cases? We are starting to work on video support in DALI Backend and we'd like to know some possible requirements.

Initial plan is to support the video data, provided that whatever is passed to the input of the DALI pipeline is an entire video file in binary buffer. Would this suffice your needs?

My initial use case was performing inference with a slew of models on a data warehouse of videos (not live streaming frames from a camera). Currently, inference is performed on a single video frame (Bx1xCxWxH) with each GPU taking a separate shard of the video dataset, but I can see have a batch of multiple frames in the future as well (BxNxCxWxH). One nice thing that I assume Triton would be capable of, is using the dedicated NVDEC GPU hardware to decode the video asynchronously from the model inference operations to increase throughput.

You mentioned passing in the entire video in a binary buffer. I have 4K videos encoded at high quality, so they can be 100s of MB and even up to a 1GB in size. Not sure if that's an issue, so just bringing that up in case. I also have a lot of VFR videos, but I know DALI's video reader for that is still in progress.

@iafydsttta
Copy link

iafydsttta commented Sep 14, 2022

Our use case would be the following:
Video files (standardized encoding) become available to a triton client as filepaths. Ideally, TRTIS DALI backend would accept a file path, decode the video into frames and trigger other backends to apply models loaded into TRTIS to these frames.

Our reason for using vanilla TRITIS and not in combination with Nvidia Deep Stream:

  • We would only need a very small part of the functionality that DS provides.
  • We cannot assume that customers have Nvidia hardware installed on premises, so decoding/inference must also be possible via CPU

@Alwahsh
Copy link

Alwahsh commented Sep 1, 2023

Is it visible for you when that feature could be come available?

@szalpal
Copy link
Member

szalpal commented Sep 4, 2023

@Alwahsh ,

We recently updated DALI's capabilities to support some of the Video scenarios. We added two video decoding operators: fn.decoders.video and fn.inputs.video. These two handle two different scenarios:

  1. When your decoded video fits entirely into memory, you should use fn.decoders.video. Via Triton it works in a typical DALI+Triton way - encoded video is injected into DALI Pipeline using fn.external_source and then decoded with VideoDecoder. This approach is reflected in the video decoding example
  2. When your video is to large to be entirely decoded into the memory, please use fn.inputs.video. Here we modify the usage a little bit. Instead of using fn.external_source, the input operator to the DALI Pipeline will be the fn.inputs.video itself. The example of this approach is the video_decode_remap

Hopefully this addresses your use case. If not, please let me know and maybe we can figure something out. In case you have any questions about usage, don't hesitate to ask :)

@Alwahsh
Copy link

Alwahsh commented Sep 11, 2023

@szalpal That is what I was looking for. Thanks a lot. Please let me know if I should open a separate issue for my questions.

From the documentation here:

This operator takes only one video as and input (i.e. input_batch_size=1) and will return batches of sequences. Every output batch will have the max_batch_size samples, set during the Pipeline creation.

Does that mean that there's currently a limitation of having only batch size of one in Triton's Configuration file?

max_batch_size: 1

Is there a way to accept higher batch sizes in videos on Triton?

I tried to increase the max_batch_size in the configuration and it was working if I have sequence_length set to a high value but if it is small, I get an error:

Cannot split a shape list with X samples to list shapes of total 1 samples.

where X is the max_batch_size I chose.

I can't make sense of that behavior but I assume when it works, it's actually not generating correct information from the batch of videos but instead getting information from the first one only.

@szalpal
Copy link
Member

szalpal commented Sep 12, 2023

@Alwahsh ,

it would be best if you could provide info about your input (like video duration etc), but I'll try to answer your question regardless.

The VideoInput operator takes always one video as an input. However, in the output it provides a batch of sequences. The next sentence from the documentation after the one you've pasted above:

Every output batch will have the max_batch_size samples

Therefore, if you create a pipeline like this:

@pipeline_def(batch_size=3, ...)
def pipe():
    return fn.inputs.video(sequence_length=5, ...)

In the output you'll get a batch of 3 sequences and every sequence will have 5 frames.

However, the error message you've encountered might be a symptom of a bug. Could you tell, what was the duration of your video file (in frames) and what parameters (batch_size and sequence_length) you used?

To answer your remaining questions:

Does that mean that there's currently a limitation of having only batch size of one in Triton's Configuration file?

No. The value you set in max batch size field in the Triton's configuration will override whatever you've put in DALI's pipeline definition (i.e. in dali.py). The sentence you've quoted is mainly targeted in a standalone usage of DALI Pipeline, where you can feed the pipeline with custom data. In Triton it means, that your client has to send a request with only one encoded video.

If you have any more questions, don't hesitate to ask :)

@Alwahsh
Copy link

Alwahsh commented Sep 13, 2023

@szalpal Thanks for your response and keen on help 🙏.
The duration of my video is 10 seconds. It contains 249 frames.

I think I might have confusion about the meaning of max_batch_size in Triton's Configuration file. According to the documentation here: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#maximum-batch-size and my previous trials with image decoding in DALI + Triton, I thought max_batch_size refers to the number of inputs from different requests(assuming each request has batch size of 1 from the client) that are merged together to form a batch of inputs that go together to the model for more efficient parallel processing.
If I understand what you're saying correctly, for this specific pipeline, it would control the number of sequences that are given to the output rather than the input batch size?

This is my pipeline code for reference:

import nvidia.dali as dali
from nvidia.dali import fn
import nvidia.dali.types as types
from nvidia.dali.plugin.triton import autoserialize

TOTAL_FRAMES = 16 # I am trying to simulate decoding 16 frames only.
REQUESTED_FRAMES = 16.0
OUT_WIDTH = 224
OUT_HEIGHT = 224

@autoserialize
@dali.pipeline_def(batch_size=1, num_threads=4, device_id=0,
                   output_dtype=dali.types.FLOAT, output_ndim=4)
def pipeline():
  seq = fn.experimental.inputs.video(name="INPUT", sequence_length=TOTAL_FRAMES, device='mixed')
  seq = fn.resize(seq, resize_x=OUT_WIDTH, resize_y=OUT_HEIGHT)

  seq = fn.slice(seq, rel_start=[0.0], rel_shape=[REQUESTED_FRAMES/TOTAL_FRAMES], axes=[0])

  seq = fn.pad(seq, axis_names='F', align=REQUESTED_FRAMES)

  transposed_seq = fn.transpose(seq, perm=[0,3,1,2])

  return fn.cast(transposed_seq, dtype=dali.types.FLOAT, name='DALI_OUTPUT_0')

and my Triton Configuration file:

name: "video_decoder_gpu"
backend: "dali"
max_batch_size: 1
input [
  {
    name: "INPUT"
    data_type: TYPE_UINT8
    dims: [ -1 ]
  }
]

output [
  {
    name: "DALI_OUTPUT_0"
    data_type: TYPE_FP32
    dims: [ 16, 3, 224, 224 ]
  }
]

parameters: [
 {
   key: "num_threads"
   value: { string_value: "1" }
 }
]

dynamic_batching {}

The following table shows the configurations that work and that give an error:

max_batch_size(Triton Config File) Perf Analyzer Client Batch Size Error/Success
1 1 Success
1 2 Expected Error: inference request batch-size must be <= 1
2 1 error: DALI Backend error: Cannot split a shape list with 2 samples to list shapes of total 1 samples.
2 2 Success

Concurrency of perf analyzer for all trials are set to 1. So, in the 3rd row, although the max_batch_size is set to 2, the case of receiving 2 requests in parallel doesn't happen causing the input batch size to always be 1, still the error happens.
What I'm guessing from this is that indeed max_batch_size in video pipelines is actually the number of output patches(segments of the video) from the same video unlike the normal Triton Inference Server case where max_batch_size is the max. acceptable input batch size.

If my understanding is correct, does that mean there is currently no way to do dynamic batching and process multiple videos at the same time as a single batch in Triton + DALI setup?

@szalpal
Copy link
Member

szalpal commented Sep 13, 2023

@Alwahsh ,

Thank you for providing these details. I'll look into the table you provided and verify if this is an unwanted behaviour or not.

If I understand correctly, you want to decode only 16 first frames from every video file you have? My remark in this case is that it won't be easy. Unfortunately at the moment in DALI we assume, that the whole video file needs to be processed, regardless of the type of video processing operator used. How important this approach would be for you? We may be able to add a feature that allows for such behaviour, I'd need to consult this with the team.

If my understanding is correct, does that mean there is currently no way to do dynamic batching and process multiple videos at the same time as a single batch in Triton + DALI setup?

Partially correct :) fn.inputs.video operator in DALI is not able to process multiple videos at the same time. fn.decoders.video is, however this operator needs to decode the entire video in the memory, which for longer videos (and especially when processing multiple of them at the same time) easily throws OOM. Same restrictions apply for dynamic batching.

If I understand what you're saying correctly, for this specific pipeline, it would control the number of sequences that are given to the output rather than the input batch size?

Correct.

@Alwahsh
Copy link

Alwahsh commented Sep 13, 2023

@szalpal Thanks for the clarification and continued help.

If I understand correctly, you want to decode only 16 first frames from every video file you have?

Yes, that's what I want to do. Please note though that I get high performance if I set TOTAL_FRAMES to 16 and much lower performance from the pipeline(higher latency and lower throughput) if I set TOTAL_FRAMES to 249 so I'm assuming the functionality of decoding only a specific number of frames is there. It's just that it happens only for 1 video but not multiple ones?

The functionality is indeed important to me because it affects the performance. The most useful thing would be the ability to parse specific frames not necessarily from the beginning(common sampling in DNN applications) but for now since I'm only measuring the performance, I'm fine with simulating the number of decoded frames as if they're the first X frames rather than the actual ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

6 participants