Decode sampled tuple frames only by AbdelStark · Pull Request #12 · Netflix/void-model

AbdelStark · 2026-04-08T13:23:04Z

What changed

This PR replaces eager full-video decoding in the video_mask_tuple training path with selective frame decoding.

add a shared loader for tuple-backed samples in videox_fun/utils/video_tuple_loader.py
compute batch_index first, then decode only the requested frames for rgb_full.mp4, rgb_removed.mp4, mask.mp4, and optional depth_removed.mp4
apply the same path to both dataset_image_video.py and dataset_image_video_warped.py
preserve the existing trimask / quadmask quantization, depth handling, and PNG-directory fallback behavior

Why

The training datasets were decoding whole tuple videos and only then subselecting the clip used for the batch. On long sequences that wastes CPU, RAM, and disk I/O on frames the model never sees.

Selective decode keeps the released VOID data path the same from the model perspective, but removes avoidable host-side work from the loader.

Impact

lower host memory pressure during training
less CPU and disk work per sampled batch
better headroom for dataloader parallelism on long tuple videos

Validation

python3 -m py_compile videox_fun/utils/video_tuple_loader.py videox_fun/data/dataset_image_video.py videox_fun/data/dataset_image_video_warped.py
exercised the PNG-directory fallback with a stubbed smoke test to confirm sampled-frame loading and output shapes

I did not run a training job and not even inference in this environment.

JVSCHANDRADITHYA · 2026-04-08T15:07:12Z

selective decoding makes a lot of sense for longer sequences.

But edge case issues like temporal coherence, or alignment with masks, might only appear during inference/training, if it's broken.

AbdelStark · 2026-04-10T09:20:28Z

selective decoding makes a lot of sense for longer sequences.

But edge case issues like temporal coherence, or alignment with masks, might only appear during inference/training, if it's broken.

Ok no problem, i understand. thanks for your time reviewing

Decode sampled tuple frames only

76521e1

AbdelStark mentioned this pull request Apr 8, 2026

optimisation: Decode sampled tuple frames only #11

Closed

AbdelStark marked this pull request as ready for review April 8, 2026 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decode sampled tuple frames only#12

Decode sampled tuple frames only#12
AbdelStark wants to merge 1 commit intoNetflix:mainfrom
AbdelStark:selective-tuple-video-decode

AbdelStark commented Apr 8, 2026 •

edited

Loading

Uh oh!

JVSCHANDRADITHYA commented Apr 8, 2026

Uh oh!

AbdelStark commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AbdelStark commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Why

Impact

Validation

Uh oh!

JVSCHANDRADITHYA commented Apr 8, 2026

Uh oh!

AbdelStark commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AbdelStark commented Apr 8, 2026 •

edited

Loading