Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #111: add helpful generation metadata #123

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

sowbug
Copy link

@sowbug sowbug commented Sep 6, 2022

Fixes #111. EXIF wasn't exactly the right scheme; rather, XMP is recognized across both PNG and JPEG. An open issue is that https://github.com/commonsmachinery/tinyxmp is currently buggy (which is why I'm using my own fork), and it doesn't have a pip installer (there's an open issue for that - commonsmachinery/tinyxmp#2).

But this seems to work, so I think it's a step forward to merge it. Thanks.

Problems:

1. The pip package python-xmp-toolkit isn't supported on Windows,
   and doesn't automatically install the required C++-based
   dependency libexempi3. See
   https://python-xmp-toolkit.readthedocs.io/en/latest/installation.html
   for manual installation steps.

2. The metadata is just a dump of argparse's parsed options, which
   could leak private information in arguments such as outdir. A
   todo is to build a dictionary from the argparse namespace, but
   exclude fields that could contain sensitive information.

this might work
@bakkot
Copy link

bakkot commented Sep 6, 2022

I have a proposal at invoke-ai/InvokeAI#266 for structuring metadata consistently across forks; would you consider adopting it? If there's other information you think should be included in the metadata I'm happy to update my spec.

@fat-tire
Copy link

fat-tire commented Sep 7, 2022

This ☝️ as we're trying to standardize it across various apps/UIs -- the goal being to not only read metadata from image created by any app/UI but to be able to cross-share json between them as well.

@sowbug
Copy link
Author

sowbug commented Sep 7, 2022

@bakkot see Sygil-Dev/sygil-webui#283 (reply in thread). I like what you've done with structured metadata. I also believe there's value in having an imperfect solution today. I hope the community pursues both approaches. I don't have the cycles to invest in the "what," but my PR does help with the "how" by providing simple and portable code that anyone is welcome to adapt to whatever standardization emerges (assuming it uses XMP).

@sowbug
Copy link
Author

sowbug commented Sep 7, 2022

(repeating some points from that other thread) I'm excited that Google Photos and Flickr already index the XMP Description field for searching (try https://flickr.com/search/?text=psychedelic%20mushrooms%20plms), and render it in their UI. This is why I hope that no matter which approach the community takes, we also stuff as much useful metadata as possible into an existing field like XMP Description.

Screenshot from 2022-09-07 08-28-16

Screenshot from 2022-09-07 08-26-26

@AltoRetrato
Copy link

Is there a specific reason for using JSON to serialize the data?

I'm currently storing metadata in JPEG images using IPTC (with https://github.com/james-see/iptcinfo3), but saving all parameters (even argv[0]) in the "caption/abstract", so you could copy and paste it into the command prompt to run it. Snippet:

parser.add_argument("--iptc", type=bool, nargs="?", default=True, help="save IPTC data to output image (only for JPG)")
# ...
fn = os.path.join(sample_path, "seed_" + str(opt.seed) + "_" + f"{base_count:05}.{opt.format}")
Image.fromarray(x_sample.astype(np.uint8)).save(
    fp = fn,
    quality = 95,
)
if opt.iptc and opt.format == "jpg":
    info = IPTCInfo(fn, force=True)
    cmd  = " ".join([f'"{x}"' if " " in x else x for x in sys.argv])
    info["caption/abstract"] = f"{cmd}{f' --seed {opt.seed}' if '--seed' not in cmd else ''}"[:2000]
    info["supplemental category"] = ["stable diffusion"]
    info.save(options="overwrite")

Result (as seen in Irfanview):
image

I wonder if using this format (reproducing exactly how the parameters are used in the input of the program) would be better, because it would make it easier to reproduce the original image (instead of doing a lot of copy-and-paste and editing)...

@sowbug
Copy link
Author

sowbug commented Sep 9, 2022

Is there a specific reason for using JSON to serialize the data?

The reason I picked JSON is that I didn't expect people to use the data as a single atomic unit, but rather to pick and choose the interesting arguments in their own invocations of txt2img.py. For one thing, it's dangerous to encourage people to triple-click/copy/paste into shell, because someone might throw something malicious in there like rm -rf /;. For another, there are certain arguments that really should be customized, such as outdir, and that could leak private data (outdir=/home/my_real_name_and_wifi_password/outputs). The data's there (with sensitive parts removed), but it's not urging you to blindly paste it into your shell.

I also want to include the checkpoint # of the model (e.g., ckpt-1.4 etc.). That's pretty important to a perfect reproduction of a given image. But it's not part of the command-line parameters; it's an aspect of the environment/configuration.

Finally, I wanted a format that was both human- and machine-readable. argv isn't really hard for a machine to parse, but JSON is a better format if you're thinking of it as a serialization format rather than a faithful representation of the exact command-line used to generate the image.

All that said, I don't have really strong feelings about it. As long as the data is embedded in the output images, I'm happy.

@fat-tire
Copy link

fat-tire commented Sep 9, 2022

Funny, I've never heard of IPTC before. I think the idea of using json was that it was a near universal structured format that's human readable and can easily be imported/exported to anything in almost any language-- I guess XML is too old and wordy?

Also I think there is a need to include the hash of the weights model, as well as information about the actual app/repository/script being used, including its location, that wouldn't be included necessarily and could break if you tried to use the same prompt on a forked but popular version of stable diffusion. Or vice-versa, if you wanted to use a specialized prompt upstream you would need to know that you cannot and moreover where to get the correct application. So you really want to be able to include as much about what went into making the image as you can, and that may include GPU type, app info, the specific version of the model, etc. Which isn't likely contained in the prompt arguments.

I was originally thinking this could be done as a straight-up URI, but was convinced that would be too long and unwieldy. Anyway, I proposed an in-progress json schema here, just to get started.

Cheers!

@AltoRetrato
Copy link

For one thing, it's dangerous to encourage people to triple-click/copy/paste into shell, because someone might throw something malicious in there like rm -rf /;.

True! Removing argv[0] might help a bit, but such exploits will surely be done, anyway (no matter where or how program parameters are shared): "Hey, try this funny prompt: "Beautiful face, cyborg, cursing, %&#$";rm -rf /" - that could be in a JSON file and end up in a shell prompt...

I also want to include the checkpoint # of the model (e.g., ckpt-1.4 etc.). That's pretty important to a perfect reproduction of a given image. But it's not part of the command-line parameters; it's an aspect of the environment/configuration.

Agreed! But not only the model, there are so many forks/versions of stable diffusion available that fully describing the env/config is difficult (and that's one of the reasons for me to include argv[0] in the metadata).

Finally, I wanted a format that was both human- and machine-readable. argv isn't really hard for a machine to parse, but JSON is a better format if you're thinking of it as a serialization format rather than a faithful representation of the exact command-line used to generate the image.

In the end, JSON might be a better format for many reasons, even more if it could be directly used as the input parameters to the program: that would eliminate many of the risks of parameters exploitation. You could simply "load the image", either via GUI or command line, and add/change the parameters you want. So, adding this functionality (directly reading parameters from an image file) would also be important.

All that said, I don't have really strong feelings about it. As long as the data is embedded in the output images, I'm happy.

I think the worst part is HOW to embed the metadata: EXIF, IPTC and XMP are only a few of the standards, with varying degrees of support. I'm using IPTC just because my default image viewer supports it, but I'm not sure if it works as well with other programs / platforms.

@fat-tire
Copy link

fat-tire commented Sep 9, 2022

For one thing, it's dangerous to encourage people to triple-click/copy/paste into shell, because someone might throw something malicious in there like rm -rf /;.

+1 and this is why sanitizing input will be super important...

True! Removing argv[0] might help a bit, but such exploits will surely be done, anyway (no matter where or how program parameters are shared): "Hey, try this funny prompt: "Beautiful face, cyborg, cursing, %&#$";rm -rf /" - that could be in a JSON file and end up in a shell prompt...

This is true of anyone copy/pasting prompts, so the more you can clean up all input data, the better

I also want to include the checkpoint # of the model (e.g., ckpt-1.4 etc.). That's pretty important to a perfect reproduction of a given image. But it's not part of the command-line parameters; it's an aspect of the environment/configuration.

The json schema being discussed includes a hash of and link to the model card/file, fwiw.

In the end, JSON might be a better format for many reasons, even more if it could be directly used as the input parameters to the program: that would eliminate many of the risks of parameters exploitation. You could simply "load the image", either via GUI or command line, and add/change the parameters you want. So, adding this functionality (directly reading parameters from an image file) would also be important.

That's exactly one fo the goals-- to try to reduce as much attack surface as possible... but I do acknowlege the security challenges, no matter which method is used-- and it will be up to anything handling this json to be extraordinarily careful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store prompts/parameters in EXIF
4 participants