Fix #111: add helpful generation metadata #123

sowbug · 2022-09-06T17:54:20Z

Fixes #111. EXIF wasn't exactly the right scheme; rather, XMP is recognized across both PNG and JPEG. An open issue is that https://github.com/commonsmachinery/tinyxmp is currently buggy (which is why I'm using my own fork), and it doesn't have a pip installer (there's an open issue for that - commonsmachinery/tinyxmp#2).

But this seems to work, so I think it's a step forward to merge it. Thanks.

Problems: 1. The pip package python-xmp-toolkit isn't supported on Windows, and doesn't automatically install the required C++-based dependency libexempi3. See https://python-xmp-toolkit.readthedocs.io/en/latest/installation.html for manual installation steps. 2. The metadata is just a dump of argparse's parsed options, which could leak private information in arguments such as outdir. A todo is to build a dictionary from the argparse namespace, but exclude fields that could contain sensitive information. this might work

bakkot · 2022-09-06T22:21:13Z

I have a proposal at invoke-ai/InvokeAI#266 for structuring metadata consistently across forks; would you consider adopting it? If there's other information you think should be included in the metadata I'm happy to update my spec.

fat-tire · 2022-09-07T00:20:00Z

This ☝️ as we're trying to standardize it across various apps/UIs -- the goal being to not only read metadata from image created by any app/UI but to be able to cross-share json between them as well.

sowbug · 2022-09-07T15:50:42Z

@bakkot see Sygil-Dev/sygil-webui#283 (reply in thread). I like what you've done with structured metadata. I also believe there's value in having an imperfect solution today. I hope the community pursues both approaches. I don't have the cycles to invest in the "what," but my PR does help with the "how" by providing simple and portable code that anyone is welcome to adapt to whatever standardization emerges (assuming it uses XMP).

sowbug · 2022-09-07T15:55:40Z

(repeating some points from that other thread) I'm excited that Google Photos and Flickr already index the XMP Description field for searching (try https://flickr.com/search/?text=psychedelic%20mushrooms%20plms), and render it in their UI. This is why I hope that no matter which approach the community takes, we also stuff as much useful metadata as possible into an existing field like XMP Description.

AltoRetrato · 2022-09-08T22:59:37Z

Is there a specific reason for using JSON to serialize the data?

I'm currently storing metadata in JPEG images using IPTC (with https://github.com/james-see/iptcinfo3), but saving all parameters (even argv[0]) in the "caption/abstract", so you could copy and paste it into the command prompt to run it. Snippet:

parser.add_argument("--iptc", type=bool, nargs="?", default=True, help="save IPTC data to output image (only for JPG)")
# ...
fn = os.path.join(sample_path, "seed_" + str(opt.seed) + "_" + f"{base_count:05}.{opt.format}")
Image.fromarray(x_sample.astype(np.uint8)).save(
    fp = fn,
    quality = 95,
)
if opt.iptc and opt.format == "jpg":
    info = IPTCInfo(fn, force=True)
    cmd  = " ".join([f'"{x}"' if " " in x else x for x in sys.argv])
    info["caption/abstract"] = f"{cmd}{f' --seed {opt.seed}' if '--seed' not in cmd else ''}"[:2000]
    info["supplemental category"] = ["stable diffusion"]
    info.save(options="overwrite")

Result (as seen in Irfanview):

I wonder if using this format (reproducing exactly how the parameters are used in the input of the program) would be better, because it would make it easier to reproduce the original image (instead of doing a lot of copy-and-paste and editing)...

sowbug · 2022-09-09T00:38:11Z

Is there a specific reason for using JSON to serialize the data?

The reason I picked JSON is that I didn't expect people to use the data as a single atomic unit, but rather to pick and choose the interesting arguments in their own invocations of txt2img.py. For one thing, it's dangerous to encourage people to triple-click/copy/paste into shell, because someone might throw something malicious in there like rm -rf /;. For another, there are certain arguments that really should be customized, such as outdir, and that could leak private data (outdir=/home/my_real_name_and_wifi_password/outputs). The data's there (with sensitive parts removed), but it's not urging you to blindly paste it into your shell.

I also want to include the checkpoint # of the model (e.g., ckpt-1.4 etc.). That's pretty important to a perfect reproduction of a given image. But it's not part of the command-line parameters; it's an aspect of the environment/configuration.

Finally, I wanted a format that was both human- and machine-readable. argv isn't really hard for a machine to parse, but JSON is a better format if you're thinking of it as a serialization format rather than a faithful representation of the exact command-line used to generate the image.

All that said, I don't have really strong feelings about it. As long as the data is embedded in the output images, I'm happy.

fat-tire · 2022-09-09T00:38:36Z

Funny, I've never heard of IPTC before. I think the idea of using json was that it was a near universal structured format that's human readable and can easily be imported/exported to anything in almost any language-- I guess XML is too old and wordy?

Also I think there is a need to include the hash of the weights model, as well as information about the actual app/repository/script being used, including its location, that wouldn't be included necessarily and could break if you tried to use the same prompt on a forked but popular version of stable diffusion. Or vice-versa, if you wanted to use a specialized prompt upstream you would need to know that you cannot and moreover where to get the correct application. So you really want to be able to include as much about what went into making the image as you can, and that may include GPU type, app info, the specific version of the model, etc. Which isn't likely contained in the prompt arguments.

I was originally thinking this could be done as a straight-up URI, but was convinced that would be too long and unwieldy. Anyway, I proposed an in-progress json schema here, just to get started.

Cheers!

AltoRetrato · 2022-09-09T13:01:20Z

For one thing, it's dangerous to encourage people to triple-click/copy/paste into shell, because someone might throw something malicious in there like rm -rf /;.

True! Removing argv[0] might help a bit, but such exploits will surely be done, anyway (no matter where or how program parameters are shared): "Hey, try this funny prompt: "Beautiful face, cyborg, cursing, %&#$";rm -rf /" - that could be in a JSON file and end up in a shell prompt...

I also want to include the checkpoint # of the model (e.g., ckpt-1.4 etc.). That's pretty important to a perfect reproduction of a given image. But it's not part of the command-line parameters; it's an aspect of the environment/configuration.

Agreed! But not only the model, there are so many forks/versions of stable diffusion available that fully describing the env/config is difficult (and that's one of the reasons for me to include argv[0] in the metadata).

Finally, I wanted a format that was both human- and machine-readable. argv isn't really hard for a machine to parse, but JSON is a better format if you're thinking of it as a serialization format rather than a faithful representation of the exact command-line used to generate the image.

In the end, JSON might be a better format for many reasons, even more if it could be directly used as the input parameters to the program: that would eliminate many of the risks of parameters exploitation. You could simply "load the image", either via GUI or command line, and add/change the parameters you want. So, adding this functionality (directly reading parameters from an image file) would also be important.

All that said, I don't have really strong feelings about it. As long as the data is embedded in the output images, I'm happy.

I think the worst part is HOW to embed the metadata: EXIF, IPTC and XMP are only a few of the standards, with varying degrees of support. I'm using IPTC just because my default image viewer supports it, but I'm not sure if it works as well with other programs / platforms.

fat-tire · 2022-09-09T15:16:09Z

For one thing, it's dangerous to encourage people to triple-click/copy/paste into shell, because someone might throw something malicious in there like rm -rf /;.

+1 and this is why sanitizing input will be super important...

True! Removing argv[0] might help a bit, but such exploits will surely be done, anyway (no matter where or how program parameters are shared): "Hey, try this funny prompt: "Beautiful face, cyborg, cursing, %&#$";rm -rf /" - that could be in a JSON file and end up in a shell prompt...

This is true of anyone copy/pasting prompts, so the more you can clean up all input data, the better

I also want to include the checkpoint # of the model (e.g., ckpt-1.4 etc.). That's pretty important to a perfect reproduction of a given image. But it's not part of the command-line parameters; it's an aspect of the environment/configuration.

The json schema being discussed includes a hash of and link to the model card/file, fwiw.

In the end, JSON might be a better format for many reasons, even more if it could be directly used as the input parameters to the program: that would eliminate many of the risks of parameters exploitation. You could simply "load the image", either via GUI or command line, and add/change the parameters you want. So, adding this functionality (directly reading parameters from an image file) would also be important.

That's exactly one fo the goals-- to try to reduce as much attack surface as possible... but I do acknowlege the security challenges, no matter which method is used-- and it will be up to anything handling this json to be extraordinarily careful.

sowbug added 9 commits September 5, 2022 16:45

Switch to machine-readable JSON; suppress outdir

7f28699

Switch from non-portable python-xmp-toolkit to pyexiv2

7182afe

trying py3exiv2

932e71c

Fourth library: tinyxmp

263911f

fix conda syntax

7eb0f59

documentation

92a6a10

undo accidental whitespace change

f7249cd

implement for img2img

70f71af

sowbug mentioned this pull request Sep 6, 2022

Write more metadata to output files invoke-ai/InvokeAI#406

Closed

fat-tire mentioned this pull request Sep 9, 2022

RFC: Structured metadata invoke-ai/InvokeAI#266

Closed

sowbug mentioned this pull request Sep 22, 2022

unicode -> ascii commonsmachinery/tinyxmp#5

Merged

vladoatanasov mentioned this pull request Oct 6, 2022

Write metadata to a txt file #203

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #111: add helpful generation metadata #123

Fix #111: add helpful generation metadata #123

sowbug commented Sep 6, 2022

bakkot commented Sep 6, 2022

fat-tire commented Sep 7, 2022 •

edited

Loading

sowbug commented Sep 7, 2022

sowbug commented Sep 7, 2022

AltoRetrato commented Sep 8, 2022

sowbug commented Sep 9, 2022 •

edited

Loading

fat-tire commented Sep 9, 2022

AltoRetrato commented Sep 9, 2022

fat-tire commented Sep 9, 2022

Fix #111: add helpful generation metadata #123

Are you sure you want to change the base?

Fix #111: add helpful generation metadata #123

Conversation

sowbug commented Sep 6, 2022

bakkot commented Sep 6, 2022

fat-tire commented Sep 7, 2022 • edited Loading

sowbug commented Sep 7, 2022

sowbug commented Sep 7, 2022

AltoRetrato commented Sep 8, 2022

sowbug commented Sep 9, 2022 • edited Loading

fat-tire commented Sep 9, 2022

AltoRetrato commented Sep 9, 2022

fat-tire commented Sep 9, 2022

fat-tire commented Sep 7, 2022 •

edited

Loading

sowbug commented Sep 9, 2022 •

edited

Loading