-
Notifications
You must be signed in to change notification settings - Fork 316
New Optical Flow OP & Allow to save the computed optical flows #824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 16 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
7bd7ae6
* allow to save the computed optical flows in two video motion score …
HYLcool 0d4f88a
* update according to gemini's comments
HYLcool 4cc2347
* fix two test cases
HYLcool 4b4ac2b
+ add video_motion_score_filter
HYLcool bf43b8e
* update uv.lock
HYLcool 0a971ef
* update test cases
HYLcool 2ef7897
Merge branch 'refs/heads/main' into feat/opt_flow_saving
HYLcool a06c332
Merge branch 'refs/heads/main' into feat/opt_flow_saving
HYLcool 384047f
* update uv.lock
HYLcool 3fc8685
* update build_op_doc hook: check the op num table as well
HYLcool 7775a51
Merge branch 'main' into feat/opt_flow_saving
HYLcool fc4d0f5
Merge branch 'refs/heads/main' into feat/opt_flow_saving
HYLcool 25472c6
Merge branch 'refs/heads/main' into feat/opt_flow_saving
HYLcool 11c9294
* update uv.lock
HYLcool 8d89423
* limit timm to v1.0.22 and update uv.lock
HYLcool 606aa87
* use customized repos from org instead of personal
HYLcool 0740aba
* fix cython building
HYLcool 97549d1
* merge from main
HYLcool 0fb1137
* update cuda version of the base layer
HYLcool decf446
* use "with" instead of decorator on function to avoid import torch w…
HYLcool d5364ab
Merge branch 'refs/heads/main' into feat/opt_flow_saving
HYLcool e443f98
* update optical flows saving to align the latest impl.
HYLcool 10589f3
* replace uv path
HYLcool b626d64
+ add new arg in the subclass of video_motion_score_filter
HYLcool File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
111 changes: 111 additions & 0 deletions
111
data_juicer/ops/filter/video_motion_score_ptlflow_filter.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| import sys | ||
| from typing import Optional, Tuple, Union | ||
|
|
||
| from jsonargparse import dict_to_namespace | ||
| from pydantic import PositiveFloat, PositiveInt | ||
|
|
||
| from data_juicer.ops.filter.video_motion_score_filter import VideoMotionScoreFilter | ||
| from data_juicer.utils.constant import MetaKeys | ||
| from data_juicer.utils.lazy_loader import LazyLoader | ||
| from data_juicer.utils.resource_utils import cuda_device_count | ||
|
|
||
| from ..base_op import OPERATORS, UNFORKABLE | ||
|
|
||
| torch = LazyLoader("torch") | ||
| tvm = LazyLoader("torchvision.models") | ||
| tvt = LazyLoader("torchvision.transforms") | ||
| ptlflow = LazyLoader("ptlflow") | ||
| ptlflow_io_adapter = LazyLoader("ptlflow.utils.io_adapter") | ||
|
|
||
| OP_NAME = "video_motion_score_ptlflow_filter" | ||
|
|
||
|
|
||
| @UNFORKABLE.register_module(OP_NAME) | ||
| @OPERATORS.register_module(OP_NAME) | ||
| class VideoMotionScorePtlflowFilter(VideoMotionScoreFilter): | ||
| """Filter to keep samples with video motion scores within a specified range. | ||
| This operator utilizes the ptlflow library (https://github.com/hmorimitsu/ptlflow) to | ||
| predict optical flow between video frames. It keeps samples where the | ||
| video motion score is within the given min and max score range. The motion score is | ||
| computed based on the optical flow between frames, which is estimated using the models | ||
| supported in ptlflow. The operator can sample frames at a specified FPS and apply | ||
| transformations to the frames before computing the flow. | ||
| - The models in ptlflow is used to estimate the optical flow. | ||
| - Frames are preprocessed using a series of transformations including normalization and | ||
| color channel flipping. | ||
| - The motion score is calculated from the optical flow data. | ||
| - The operator can be configured to filter based on any or all frames in the video. | ||
| - The device for model inference (CPU or CUDA) is automatically detected and set. | ||
| For further details, refer to the official documentation: | ||
| https://ptlflow.readthedocs.io/ | ||
| """ | ||
|
|
||
| _accelerator = "cuda" | ||
| _default_kwargs = {} | ||
|
|
||
| def __init__( | ||
| self, | ||
| min_score: float = 1.0, | ||
| max_score: float = sys.float_info.max, | ||
| model_name: str = "dpflow", | ||
| ckpt_path: Optional[str] = "things", | ||
| get_model_args: Optional[dict] = None, | ||
| sampling_fps: PositiveFloat = 2, | ||
| size: Union[PositiveInt, Tuple[PositiveInt], Tuple[PositiveInt, PositiveInt], None] = None, | ||
| max_size: Optional[PositiveInt] = None, | ||
| divisible: PositiveInt = 8, | ||
| relative: bool = False, | ||
| any_or_all: str = "any", | ||
| if_output_optical_flow: bool = False, | ||
| optical_flow_key: str = MetaKeys.video_optical_flow, | ||
| *args, | ||
| **kwargs, | ||
| ): | ||
| super().__init__( | ||
| min_score, | ||
| max_score, | ||
| sampling_fps, | ||
| size, | ||
| max_size, | ||
| divisible, | ||
| relative, | ||
| any_or_all, | ||
| if_output_optical_flow, | ||
| optical_flow_key, | ||
| *args, | ||
| **kwargs, | ||
| ) | ||
|
|
||
| self.model_name = model_name | ||
| self.ckpt_path = ckpt_path | ||
| if get_model_args is not None: | ||
| get_model_args = dict_to_namespace(get_model_args) | ||
| self.get_model_args = get_model_args | ||
|
|
||
| def setup_model(self, rank=None): | ||
| self.model = ptlflow.get_model(self.model_name, ckpt_path=self.ckpt_path, args=self.get_model_args) | ||
| if self.use_cuda(): | ||
| rank = rank if rank is not None else 0 | ||
| rank = rank % cuda_device_count() | ||
| self.device = f"cuda:{rank}" | ||
| else: | ||
| self.device = "cpu" | ||
| self.model.to(self.device) | ||
| self.model.eval() | ||
|
|
||
| def compute_flow(self, prev_frame, curr_frame): | ||
| if prev_frame is None: | ||
| flow = None | ||
| else: | ||
| io_adapter = ptlflow_io_adapter.IOAdapter(self.model, prev_frame.shape[:2]) | ||
| frames = [prev_frame, curr_frame] | ||
| inputs = io_adapter.prepare_inputs(frames) | ||
| inputs = {key: value.to(self.device) for key, value in inputs.items()} | ||
| with torch.no_grad(): | ||
| predictions = self.model(inputs) | ||
| flows = predictions.get("flows") # shape: (1, 1, 2, H, W) | ||
| flow = flows[-1][0].detach().cpu().numpy().transpose((1, 2, 0)) # 2, H, W -> H, W, 2 | ||
| return flow, curr_frame | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.