Videos less than 30 sec not working #3

AlphaHasher · 2022-05-14T06:17:46Z

I am aware you mentioned that it does not work for videos under 30 seconds, but I wondered if there were any updates to this. Also, if there is anything I can help with, then just let me know

Farmadupe · 2022-05-15T10:28:00Z

Actually I have draft code that now works with videos under 30 seconds (hopefully will work with any video with at least 64 frames, regardless of duration).

I have also updated the actual algorithm with much closer hamming distance between duplicated videos, so the new updated code is just 'better' than the old code. (The new algorithm now uses a three-dimensional DCT, the old algorithm used ten two-dimensional DCTs)

In fact, the codebase is in a 'mostly-working' state, so I'll see if I can push it to a new branch in github. I haven't worked on it much recently (it's only a hobby project) so I'm not sure when I'll put more time into finishing it off.

Here's my current tasklist. Feel free to contribute on any of these!

Necessary Tasks (mostly boring)

Remove more calls to unwrap() in the library, currently many filesystem operations will cause a panic.
The function to create a video hash now takes options. Ideally these should be removed if it is possible to create universal defaults. Otherwise a builder interface should probably be created.
I replaced the external calls to Ffmpeg in v0.1 with bindings to libgstreamer. The advantages are: 1) faster, 2) better error reporting, 3) It's less hacky. But after completing the integration, the raw libgstreamer library nondeterministically crashes vid_dup_finder, seemingly due to known quality issues in the raw library and/or plugins/codecs. So I need to reconsider whether I need to revert to the old Ffmpeg interface.
If the libgstreamer interface is not abandoned, also check that it is easy to install libgstreamer shared libraries on windows (I'm worried it probably isn't easy)
Documentation needs to be updated
Tests need to be updated.
Check that the update is still compatible with Czkawka, because that is how most people use vid_dup_finder_lib.

Useful tasks

Find a way to check that the library works on mac. (this should probably be 'necessary')
Investigate why the library currently has very high memory usage. I think this is probably some memory fragmentation/leak in libgstreamer codecs. Unlikely alternative is memory leak/fragmentation in Rayon.
Set up some basic CI tests.
Check if using 32 frames instead of 64 is acceptable, as this will speed up the hashing process.
Check that all expected combinations of cmdline options in the app are actually supported
General code tidyup
Restore some of the utilities in the GUI application for analyzing which duplicates have the highest quality. They were previously implemented in the library but I deleted them instead of transferring them.
If I'm feeling brave, the libgstreamer interface crate could be published on crates.io. There currently isn't a well-documented crate for extracting video frames.

Pie in the sky tasks

Update the app GUI so that it's not rubbish. Alternatively cease publishing the GUI portion of the app (IMO the cmdline app is quite good quality but the GUI is very hacky.)
If a common reference dataset exists for finding near-duplicate videos exists, then test against it. Use it to discover quantitative information relative to other libraries
Enumerate the transformations that the library should be able to cope with (i.e changing brightness, resizing, resolution, compression artefects). Enumerate the transofrmations that it can't cope with (rotation, embedding, horizontal flipping, differing duration etc)
Learn alternative algorithms for finding duplicate videos. Reimplement the libraries if such libraries are quantitatively better.

Ideas for extension

Extend the library to find duplicate 'clips' within videos instead of duplicate whole video files. It should be possible to use the 3D-DCT to perform autocorrelation

IronCraftMan · 2023-06-11T17:42:19Z

Any chance you could publish what you have? Would be great to have for others to work on, even if you can't yourself.

Farmadupe · 2023-06-14T19:38:27Z

I have created branch dct3d in the vid_dup_finder_lib library. It is quite functional and is much better than v0.1.0. It may be good enough quality to publish on crates.io.

Feel free to:

Fork the codebase
Raise pull requests (I like to think I will be responsive)
publish any or all code on crates.io.

There are two choices of backend library 1) ffmpeg-commandline-interface 2) link to gstreamer shared library. gstreamer is faster but is probably difficult to bind on windows, sometimes causes crashes the entire process due to bugs, and seems to cause a lot of memory fragmentation when decoding a lot of videos of various formats.

So I suggest to keep ffmpeg.

Remaining tasks on my list are:

Check that documentation is accurate
Remove any remaining explicit panics
run/fix/delete any existing unit tests.
maybe delete ffmpeg_gst_wrapper subcrate and just use the ffmpeg_cmdline_utils

P.S. the vid_frame_iter crate may also be suitable to release, as many people on reddit ask for a simple interface to decode videos from gstreamer. It is memory-safe and zero-copy.

https://github.com/Farmadupe/vid_dup_finder_lib/tree/dct3d

Th3EvilGod · 2023-11-12T16:05:03Z

@Farmadupe any ETA, please?

evanheckert mentioned this issue Sep 26, 2023

Similar videos: Failed to hash file, reason Too short qarmin/czkawka#605

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Videos less than 30 sec not working #3

Videos less than 30 sec not working #3

AlphaHasher commented May 14, 2022

Farmadupe commented May 15, 2022

IronCraftMan commented Jun 11, 2023

Farmadupe commented Jun 14, 2023

Th3EvilGod commented Nov 12, 2023

Videos less than 30 sec not working #3

Videos less than 30 sec not working #3

Comments

AlphaHasher commented May 14, 2022

Farmadupe commented May 15, 2022

Necessary Tasks (mostly boring)

Useful tasks

Pie in the sky tasks

IronCraftMan commented Jun 11, 2023

Farmadupe commented Jun 14, 2023

Th3EvilGod commented Nov 12, 2023