Open
Conversation
|
selective decoding makes a lot of sense for longer sequences. But edge case issues like temporal coherence, or alignment with masks, might only appear during inference/training, if it's broken. |
Author
Ok no problem, i understand. thanks for your time reviewing |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
This PR replaces eager full-video decoding in the
video_mask_tupletraining path with selective frame decoding.videox_fun/utils/video_tuple_loader.pybatch_indexfirst, then decode only the requested frames forrgb_full.mp4,rgb_removed.mp4,mask.mp4, and optionaldepth_removed.mp4dataset_image_video.pyanddataset_image_video_warped.pyWhy
The training datasets were decoding whole tuple videos and only then subselecting the clip used for the batch. On long sequences that wastes CPU, RAM, and disk I/O on frames the model never sees.
Selective decode keeps the released VOID data path the same from the model perspective, but removes avoidable host-side work from the loader.
Impact
Validation
python3 -m py_compile videox_fun/utils/video_tuple_loader.py videox_fun/data/dataset_image_video.py videox_fun/data/dataset_image_video_warped.pyI did not run a training job and not even inference in this environment.