Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a remote command for batch duplicate finding. #1524

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

porridge
Copy link

Here is my first stab at it. I know I at least need to adjust for the accepted coding style, but please let me know if there are any other major things that need changing.

Based on https://github.com/porridge/image-duplicate-finder

Closes: #1520

src/pic.h Outdated Show resolved Hide resolved
src/remote.cc Outdated Show resolved Hide resolved
src/remote.cc Outdated Show resolved Hide resolved
src/remote.cc Outdated Show resolved Hide resolved
src/remote.cc Outdated Show resolved Hide resolved
src/remote.cc Outdated Show resolved Hide resolved
src/remote.cc Outdated
pid_t pid = fork();
if (pid == -1) {
perror("fork");
exit(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exit() is not called directly from anywhere else in this file, and geeqie does not generally use perror. The best course of action here is likely to log an error and return. See an example here:

static void gr_lw_id(const gchar *text, GIOChannel *, gpointer)

With that said, it's probably a lot safer to use an API like https://docs.gtk.org/gio/ctor.Subprocess.new.html instead of forking directly. That said, @qarkai is our resident expert on glib APIs, so I would defer to him.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geeqie uses g_spawn_async_with_pipes() for external editor. See editor_command_one() in src/editors.cc:1053.

src/pic.h Outdated Show resolved Hide resolved
src/meson.build Outdated Show resolved Hide resolved
src/pic.cc Outdated Show resolved Hide resolved
src/pic.h Outdated Show resolved Hide resolved
src/remote.cc Outdated Show resolved Hide resolved
src/pic.cc Outdated Show resolved Hide resolved
src/remote.cc Outdated Show resolved Hide resolved
src/remote.cc Outdated Show resolved Hide resolved
src/remote.cc Outdated Show resolved Hide resolved
src/remote.cc Outdated Show resolved Hide resolved
src/remote.cc Outdated Show resolved Hide resolved
@caclark
Copy link
Collaborator

caclark commented Oct 14, 2024

In the latest commit, the command line handling has changed. The attached .diff are the changes I think are necessary to conform with the new code:

1524-1.diff.gz

@caclark
Copy link
Collaborator

caclark commented Oct 15, 2024

Revised .diff file.
Please note that these changes are merely suggestions.

You can run the project static tests locally before making a pull request by:
./scripts/test-all.sh

1524-2.diff.gz

@porridge
Copy link
Author

porridge commented Dec 2, 2024

Thank you for the reviews @xsdg and @qarkai and the rebase @caclark!

Here's a new iteration, where I also removed the need to run in "remote" mode.

I think I addressed all the comments, apart from one, about using fork. As far as I could tell, the g_spawn_async_with_pipes function has two problems with my use case:

  1. according to its own documentation, it does not handle whitespaces in arguments well, and this is critical for reliable passing of file paths,
  2. it is not possible to use it in a way which just inherits stdin and stdout, which is necessary when the program run on each duplicate set communicates with the user through the terminal.

Example usage:

$ ./build/src/geeqie -p ?.png
/home/porridge/Pulpit/coding/geeqie/2.png /home/porridge/Pulpit/coding/geeqie/3.png
$

and with debug output:

$ ./build/src/geeqie --debug 1 ?.png
duplicates program set to "echo"
processing 3 files in set
/home/porridge/Pulpit/coding/geeqie/1.png vs /home/porridge/Pulpit/coding/geeqie/2.png: 91,749387
/home/porridge/Pulpit/coding/geeqie/1.png vs /home/porridge/Pulpit/coding/geeqie/3.png: 91,701134
/home/porridge/Pulpit/coding/geeqie/2.png vs /home/porridge/Pulpit/coding/geeqie/3.png: 99,773284
/home/porridge/Pulpit/coding/geeqie/2.png /home/porridge/Pulpit/coding/geeqie/3.png

@porridge porridge requested review from qarkai and xsdg December 2, 2024 16:52
@caclark
Copy link
Collaborator

caclark commented Dec 4, 2024

@porridge

I suggest you consider changing option --process-duplicates to --duplicates-process. This will group the three new commands together in the help output, and will be more logical when using command line completion. The short form option may be a bit illogical, but I do not think that matters.

There should be entries for the three command line long options in ./auto-complete/geeqie. This is for bash command line completion.

If you wish the --duplicates-program to be remembered between sessions, there should be entries at about lines 409 and 915 of ./src/rcfile.cc.

I have some other comments but it will be a few days before I write them down.

@caclark
Copy link
Collaborator

caclark commented Dec 6, 2024

@porridge
I am sorry that it has taken me so long to think over this feature

@ anyone else
For comment

I think there may be another way to solve this problem. It would involve some new features that may or may not be possible.

The adding of files to the Dupes Window is a bit messy. Either by drag-and-drop or right click on files or directory - but this does not work as I expect at the moment (the right-click feature opens a new Dupes Window each time, which I do not think is correct).

  1. It may be possible to include a right-click add files option to the Dupes Window

  2. It may be possible to add files to the Dupes Window via the command line e.g.
    geeqie --dupes-window-add <list of files>
    followed by the Dupes Window being opened automatically if not already open.

When the Dupes Window is open the user has the possibility to change the comparison type, the image rotation mode and other options.
When the dupes check is completed, the user has the choice of which data to select.

The selected data may be Exported to a comma-separated or tab-separated file via a right-click in the Dupes Window.

  1. It may be possible to send the same selected data to the command line e.g.
    geeqie --dupes-export | cut --fields=5

  2. For the above command to be part of an automate-able sequence, it would be necessary to know when the dupes operation has finished. I have no idea at the moment.

The plugins are available via a right-click in the Dupes window. With that pretty much anything can be achieved, and might make --dupes-export redundant. Unfortunately plugin keyboard shortcuts are not recognized.

  1. It may be possible for the Dupes Window to recognize plugin keyboard shortcuts.

  2. It may be possible for there to be a When a dupes run is finished, call this plugin feature - but that could be dangerous for an unwary user trying to delete unwanted files. I am not enthusiastic about this idea, but it is one way of knowing when the dupes run is completed (it would run whenever the user changes the comparison mode, for instance, and not just when the user has made the final decision).

@porridge
Copy link
Author

porridge commented Dec 11, 2024

@porridge

I suggest you consider changing option --process-duplicates to --duplicates-process. This will group the three new commands together in the help output, and will be more logical when using command line completion. The short form option may be a bit illogical, but I do not think that matters.

Done, @caclark

There should be entries for the three command line long options in ./auto-complete/geeqie. This is for bash command line completion.

I believe they are there already in the options variable definition? Or do you have something else in mind?

If you wish the --duplicates-program to be remembered between sessions, there should be entries at about lines 409 and 915 of ./src/rcfile.cc.

I think it's better to require the user to explicitly provide the program, in case it is destructive and the user forgot what it was set to last.

@porridge
Copy link
Author

porridge commented Dec 11, 2024

I think there may be another way to solve this problem. It would involve some new features that may or may not be possible.

@caclark, sounds like what you are suggesting would be more discoverable for a typical GUI user. OTOH it might be less convenient for use in scripted, automated workflows.

But most importantly from my perspective - I don't feel anywhere as competent as would be required to implement what you described 😅

@caclark
Copy link
Collaborator

caclark commented Jan 4, 2025

@porridge
Attached is a .diff from the current sources. I would appreciate it if you would take a look at it.
This code is not a solution - it is just a hack to demonstrate a different way of achieving this feature.

After compiling, from a terminal window run ./build/src/geeqie
Open the dupes window and select Compare By to Similarity Custom. Set Custom Threshold to a low number e.g. 50
Close the dupes window.

From another terminal window run ./build/src/geeqie --duplicates-process <list of files.>
For <list of files> just use 4 or 5 simple jpegs.

Run ./build/src/geeqie --duplicates-export
Then try ./build/src/geeqie --duplicates-export | tail -n +2 | cut -f 2,5

The advantages I see are:
No additional comparison logic is required
No specific processing program is required
It is easier for users to process the data in the way they wish

If this is an acceptable solution, the duplicates-export function will be eliminated - the option duplicates-process ... will output the required text (there is a timing problem I did not yet solve).
1524-1.diff.gz

@porridge
Copy link
Author

porridge commented Jan 7, 2025

@caclark

After compiling, from a terminal window run ./build/src/geeqie
[...]
From another terminal window run ./build/src/geeqie --duplicates-process <list of files.> For <list of files> just use 4 or 5 simple jpegs.

It would be hard to incorporate this kind of usage (run a GUI-dependant geeqie session and while it's running, launch another process) into my workflow. Not impossible, but the changing window focus and juggling a process running in the background are tricky to handle.

FTR my use case is an automated pipeline that:

  • moves images from input directories (to which they are first synced using syncthing from various devices) into a staging location,
  • de-duplicates them
  • auto rotates and performs various other metadata fixups
  • splits them into year/month/day based directory structure.

Run ./build/src/geeqie --duplicates-export Then try ./build/src/geeqie --duplicates-export | tail -n +2 | cut -f 2,5

Also, simple text format makes it hard to reliably handle filenames which in principle can contain arbitrary whitespace characters. This could be done in a somewhat standard way using JSON format for example, but even then the need to post-process that iin order to pass further down the pipeline is yet another hurdle. This is why I found the feature of running a separate user specified command on each set of duplicates in turn particularly attractive.

The advantages I see are: No additional comparison logic is required

Yes, that would indeed be nice. While it was super easy to reuse the algorithm for comparing two images, when I tried to reuse the logic that geeqie uses internally to process a whole list of files I got completely lost in the sources 😅

src/command-line-handling.cc Outdated Show resolved Hide resolved
src/command-line-handling.cc Outdated Show resolved Hide resolved
@@ -407,6 +411,114 @@ void gq_delay(GtkApplication *, GApplicationCommandLine *app_command_line, GVari
options->slideshow.delay = static_cast<gint>(n * 10.0 + 0.01);
}

void gq_duplicates_process(GtkApplication *, GApplicationCommandLine *, GVariantDict *, GList *file_list)
{
std::map<std::string, std::unique_ptr<pic_equiv>> pics;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unique_ptr seems redundant.

Copy link
Author

@porridge porridge Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @qarkai
I addressed the other comments, but with my rudimentary C++ skills I cannot see how to address this one?

  • replacing with pic_equiv * would leak memory, right?
  • replacing with pic_equiv does not work because the assignment operator is deleted:
../src/command-line-handling.cc: In function ‘void {anonymous}::gq_duplicates_process(GtkApplication*, 
GApplicationCommandLine*, GVariantDict*, GList*)’:
../src/command-line-handling.cc:422:42: error: use of deleted function ‘pic_equiv& pic_equiv::operator=(const pic_equiv&)’
  422 |                 pics[name] = pic_equiv(fd);

and it's deleted because of the sim member.

It also seems to me like using a pointer is better than copying the sim structure around...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. ImageSimilarityData lacks constructors and moving operator. Ok. let's stay with unique_ptr for now.

src/command-line-handling.cc Outdated Show resolved Hide resolved
src/meson.build Outdated Show resolved Hide resolved
@caclark
Copy link
Collaborator

caclark commented Jan 7, 2025

@porridge
OK
Cache maintenance runs Geeqie in non-gui mode. I will use that as an example and try to create something better.

Copy link
Contributor

@qarkai qarkai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -407,6 +411,114 @@ void gq_delay(GtkApplication *, GApplicationCommandLine *app_command_line, GVari
options->slideshow.delay = static_cast<gint>(n * 10.0 + 0.01);
}

void gq_duplicates_process(GtkApplication *, GApplicationCommandLine *, GVariantDict *, GList *file_list)
{
std::map<std::string, std::unique_ptr<pic_equiv>> pics;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. ImageSimilarityData lacks constructors and moving operator. Ok. let's stay with unique_ptr for now.

explicit pic_equiv(char const *cname);
~pic_equiv();
pic_equiv(const pic_equiv& other) = delete;
pic_equiv& operator=(const pic_equiv& other) = delete;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess pic_equiv assignment operator could be restored using shared_ptr with custom deleter for sim member.

And one more thing. Would you mind renaming class to PicEquiv or something for naming consistency?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Command-line utility for duplicate image finding and processing
4 participants