-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend CleanVision to run on video data #215
Comments
I'd be super interested in this -- might come in handy when sampling for annotations etc. Potentially doing some on-the-fly deduplication using the perceptual hash to avoid sampling visually identical scenes, before passing in to do further analysis, could improve efficiency of the process. |
@LemurPwned This would be an awesome contribution! Nobody is working on this actively AFAIK. @LaFemme was interested in trying CleanVision for video and could test out your PR code too! |
@jwmueller Great, I started playing around with the idea and for simplicity sake I created a separate repo for the sampling part: https://github.com/LemurPwned/video-sampler. I'll try to integrate the sampling with the clean vision api in a PR soon. |
awesome super excited about this!! |
@lafemmephile thanks, just to be clear, you want to pick up from here? |
@lafemmephile thanks for clarifying :) -- I'm fine with this, let me know if you need any help. |
@lafemmephile I'm not a maintainer for this project, so I'm not sure to what extent my insight is useful here. But, I was essentially thinking of something similar to spacy's pipeline. A simple, minimal object which takes in the config for each step in the pipeline and then executes them in sequence. |
Hi @LemurPwned ! The video sampler works great. Could you explain in more details about the approach you're suggesting for integrating video sampler in cleanvision? More specifically a pseudocode for what the classes would look like? @lafemmephile We could start with a VideoLab class, that'd take care of sampling frames from videos, saving them and running cleanvision on those frames. I'd suggest to also think about a few finer points while working on this extension.
@lafemmephile Feel free to start more discussions on these points. I think we could start with a layout of the code first and then start filling in details for specific methods. |
@lafemmephile your understanding is correct. @sanjanag I was just thinking about a minimal implementation. Here's a more detailed outline.
As to the API outline, I was thinking something relatively simple as this (the pipeline is very crude and not abstracted here):
The |
Hey everyone, I have some experience with videos, so I may have some suggestions that might be helpful:
and
Let me know if I can help in any other way |
@smttsp in the video-sampler example repo I'm using keyframe decoding with pyav. That's what ffmpeg uses under the hood (not the Python version, but the C version). This gives you a programmatic access to frames as they are being decoded (and all of their metadata such as motion vectors) which I think is more flexible than using ffmpeg bindings. Ad. 3 I took a look at your repo and it's super interesting! I'm contemplating extending the example repo with an arbitrary filter that could operate on K accumulated frames (so it's easily extendable) which would make it possible to operate with the combination of detection+tracking, like you did, on the fly. |
@LemurPwned, cool, both sounds good. |
Hi @lafemmephile ! The idea of this Github issue to extend what cleanvision detects to videos. Hence, we strictly want to focus on extending cleanvision issues for videos.
The class for sampling frames from videos is a good starting point and it seems like you also have a decent idea of what would go in VideoLab. I'd suggest that at this point you can create a PR, which would first detect image property issues in issues in videos like dark, light, etc. |
Hi @smttsp ! Great to see your inputs on this issue. https://github.com/smttsp/temporal_consistency_odt looks great. Our team recently added support for object detection in cleanlab package. Seems like we could use your inputs there. |
Hey @lafemmephile,
Thank you! LMK if I can help in any way or if you can think of any feature that can be directly or indirectly used here. We can also think of incorporating the visualizations/exports.
One of the major considerations is whether we will evaluate videos or individual frames. I think frames would make more sense. Suppose that you have two 1-minute videos with 10 seconds overlap. So, the videos are near duplicates. But each video has 50 seconds of unique content. If we categorize the videos as near duplicates, then you need to discard one of them which doesn't make sense because we are discarding one of the 50 seconds (unique content of one of the videos). If frames are in question, you can discard one of the 10 seconds and you are fine. But the problem is then, what if we have a static video where there are several near duplicate images?
If we export the frames in a smarter way, i.e., only export the unique frames, we can have intra-video (frame-wise) uniqueness. Then near duplicate frames can only occur between two different videos. simple pseudocode for a video frame extraction
The above suggestion is basically converting a video into a set of unique frames which would enable to use all the features of imagelab. The only issue I see here is that the visualization of videos should be a bit different from images. Not sure how to do that in an elegant way but we can discuss that later on |
@sanjanag, of course, I would be happy to help! |
Actually, just read the VideoSampler code more thoroughly and it seems that it is already using p-hash. great job @LemurPwned 👏 👏 👏 |
I think all issues can be used if the goal is to build the tool on a frame-by-frame basis. But then, what is the point of calling it
If the goal is to compare videos, then all of the issues require some extra work. For example, exact duplication of videos is when all the frames of two videos are exactly the same, and Light, blurry, dark, and low information are all gonna be based on some threshold. We may have a few frames to be one of those but the rest of the video might be perfect.
I think VideoLab should find both
|
@smttsp On a sidenote, |
@lafemmephile There's a lot of nice ideas discussed here, but I still think your first PR should keep things simple. Future PRs can extend the set of video issues that can be detected further, but IMO it's best to focus on getting that first PR in now. I think the first PR can stick with the original strategy I outlined:
Note the complexity is in Step 3. We need to define aggregation functions to get issue scores per video out of the individual frames' issue scores. For example: the dark-score for a video might be the 0.95-percentile of the frames' dark scores (so that we only say a video is dark if most of its frames are dark). As to which issue types make sense here, I'd first stick with just a couple in the first PR (eg. light/dark, blurry). We don't need to support every CleanVision issue type in the 1st PR for |
@smttsp Regarding your point:
The point is the tool produces one score per video to quantify its overall quality in terms of each issue type. From the user's perspective they see the tool as analyzing the video. The fact that the tool is analyzing frames within the video is internal details abstracted away from the user. The tool is still providing nontrivial value here, figuring out how to best aggregate the cleanvision issue scores for each frame in the video into an overall score for the video requires effort on our part. In future versions, we can also analyze entire video sequences as new issue types, but I would prefer to avoid this for now for simplicity of shipping v0 of |
Some discussion of this here: #214
Making CleanVision runnable on video would be super useful to many folks!
Should be fairly easy:
The text was updated successfully, but these errors were encountered: