Skip to content

Conversation

StableLlama
Copy link

@StableLlama StableLlama commented Mar 22, 2025

This PR adds functionality to crop an image and set areas to include and exclude when the image gets exported. All new features are:

  • Export images ready to be used for training
  • Mark images manually or with the help of YOLO models to create masks for
    inclusion and exclusion of image parts for masked training. Filter based on
    markings and also their (partial) visibility
  • Crop images with advanced hints that respect relevant aspect ratios and
    bucket sizes of the training scripts
  • Rate images with 0 to 5 stars and be able to use that for filtering
  • Show the tag count for the current filter setting
  • Filter for image size

I know it is big, so I tried to split off the first two steps, so it's building on #332 and #335 (for #332 I guess #337 would be beneficial as well, although I haven't tested it).

The changes and new features are best described in the updated README: https://github.com/StableLlama/taggui/tree/crop_editor?tab=readme-ov-file#cropping-and-masking-with-markings

cropping.jpg
Screenshot, simulating a clothing LoRA dataset preparation, with the new features active.

At this time this PR is now 100% 90% 80% ready and I'd be happy about a first review and probably some foreign testing. To make it 100% ready, and to change it from draft to a real pull request, still missing is:

  • Implementation of feature detection models to automatically create hints, includes and excludes
  • Testing with real world data

Additional internal changes in this PR:

  • simplify settings infrastructure

Still missing:
- color space handling
- JPEG XL (different PR)
- crop editor (different PR)
Code refactor to respect mostly a width of 80
Advantages are smaller file sizes when compression is acceptable
(quality < 100) or lossless compression with quality = 100. Also the
alpha channel is supported and kept in the images.

Note 1: this only adds support for the export function, and is not general
JPEG XL support for taggui. There is the fork https://github.com/yggdrasil75/taggui
that does exactly this.

Note 2: You might need `pip install pillow-jxl-plugin` beforehand to
be able to export into JPEG XL.
Also refactor the code a bit to be able to use the target size
calculation more easily in (future) other modules as well.
Also change the Settings infrastructure to send a Signal
when a setting is changed.
Also rename CustomRectItem to MarkingItem
Also refactor a bit to use QSize instead of a simple tuple
@StableLlama StableLlama marked this pull request as ready for review April 2, 2025 20:05
@StableLlama
Copy link
Author

This PR has significantly improved my workflow of preparing images for training.

As an example this is how I used it for my test:

  • Collected images (I stopped with a bit more than 10k images) and saved them with a coarse structure in different directories
  • Then I used classical applications (file browser, image viewer) to condense the images to the relevant ones (the collection was for a generic concept, right now I wanted to train a specific one), this left me with 330 images
  • Using taggui I went through that list and gave manual tags to organize them better, some normal tags that will be used later on for captioning and some starting with a hash for easy filtering, like #lowres. Using the new width:> and height:> filter helped in quickly identifying the high resolution images
  • Using the new marking tool I let it find faces and hands, the faces directly as an exclude, the hands as a hint
  • I gave the interesting images star ratings (5* = must, 4* = should, 3* = can, 2* = if necessary, 1* = don't take it, 0* = unrated)
  • Filtering for stars:>=3 I had 39 images. Looking at the statistics of tag distribution I could finetune where I needed more and where I could leave out images to make the set more balanced (i.e. adjust the rating of the 2* and 3* images)
  • Then I placed a well aligned crop for each image that had a good enough rating. Holding the ctrl to snap to a common aspect ratio and having the hint about how to make a crop that contains enough pixels (that is stay over the green line) I could very easily and quickly find a good position for the crop. I think this is a game changer!
    At the same time I checked whether the excludes are proper aligned, taking the affected latents into account (i.e. the red overlay)
  • Then going into the export dialog I had a look at the statistics and could see that nearly all images had only one of three aspect ratios and those even had a rather equal distribution - except for two images. Double clicking on that line gave me a filter with exactly those two images. Then I could easily change my crop to go to one of the common aspect ratios to have well filled buckets for training
  • A last check is with crops:hand to see that no hand was cut as this can give bad anatomy after training
  • Then I could export the images in a format that cut out the excludes (in this case the face) to make those parts ignored during masked training

Conclusion:
Creating this PR was much more work that I thought it would be and so it took much longer than expected - but the result justifies it. Creating training datasets will now be so much quicker that I'm sure I'll easily catch up with the invested time.

@StableLlama
Copy link
Author

@jhc13 this PR is now open for a month. Could you already have a look at it?

I'm currently using it in a rather big captioning project and it's helping me a lot. So I'd be happy when it gets pulled soon as I think others will benefit from it as well.

@jhc13
Copy link
Owner

jhc13 commented May 6, 2025

Hi @StableLlama, I apologize for not responding to you sooner. I have been busy with other things and did not have time to work on anything related to TagGUI.

I appreciate all the time and effort you put into this and your other PRs, but I have decided not to merge this for now, for the following reasons:

  1. As I discussed in my comment on your initial PR, I consider most of these features out of scope. I have always intended for TagGUI to be a program that focuses on the text side of image datasets, rather than being a comprehensive dataset management tool.
  2. I am still very busy, so I don't have the time to properly review the code. I do plan to look at other PRs like Add support to drag selected images to external applications #336 when I have time.

You may disagree with my first point, and I do understand your perspective, but it just does not align with my personal vision for the project. You are welcome to add your features to your own fork of the program. If others find them useful too, then that's great.

Thanks again for your contributions!

@StableLlama
Copy link
Author

As I've written, I see these additions not as a dataset management tool (for that far too much is missing) but as a workflow tool to prepare the images for training. One aspect is tagging/captioning (that's where taggui is very good at) but then also in creating masks for masked training and cropping the images. Especially for cropping I know no other tool that can do it as efficiently as this PR can do it.

But as you have suggested I consider to create a fork that contains all the PR here that I think are very beneficial for efficiently preparing the images.

@Spikhalskiy
Copy link

@StableLlama this is fantastic work. Please consider making a fork with a broader vision if you have the time and passion to maintain it. It will outgrow the original project if you continue to bring new features, making it a Swiss Army knife for dataset preparation.

@StableLlama
Copy link
Author

@Spikhalskiy as I couldn't wait any longer, I've now created a fork, I'll call it "taggui workflow" as it is supporting the image preparation workflow. Next I'll pull a few other PRs here that are necessary for me.

As written above: I'm more than happy when I don't need a fork and it could be united here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants