-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #111: add helpful generation metadata #123
base: main
Are you sure you want to change the base?
Conversation
Problems: 1. The pip package python-xmp-toolkit isn't supported on Windows, and doesn't automatically install the required C++-based dependency libexempi3. See https://python-xmp-toolkit.readthedocs.io/en/latest/installation.html for manual installation steps. 2. The metadata is just a dump of argparse's parsed options, which could leak private information in arguments such as outdir. A todo is to build a dictionary from the argparse namespace, but exclude fields that could contain sensitive information. this might work
I have a proposal at invoke-ai/InvokeAI#266 for structuring metadata consistently across forks; would you consider adopting it? If there's other information you think should be included in the metadata I'm happy to update my spec. |
This ☝️ as we're trying to standardize it across various apps/UIs -- the goal being to not only read metadata from image created by any app/UI but to be able to cross-share json between them as well. |
@bakkot see Sygil-Dev/sygil-webui#283 (reply in thread). I like what you've done with structured metadata. I also believe there's value in having an imperfect solution today. I hope the community pursues both approaches. I don't have the cycles to invest in the "what," but my PR does help with the "how" by providing simple and portable code that anyone is welcome to adapt to whatever standardization emerges (assuming it uses XMP). |
(repeating some points from that other thread) I'm excited that Google Photos and Flickr already index the XMP Description field for searching (try https://flickr.com/search/?text=psychedelic%20mushrooms%20plms), and render it in their UI. This is why I hope that no matter which approach the community takes, we also stuff as much useful metadata as possible into an existing field like XMP Description. |
Is there a specific reason for using JSON to serialize the data? I'm currently storing metadata in JPEG images using IPTC (with https://github.com/james-see/iptcinfo3), but saving all parameters (even argv[0]) in the "caption/abstract", so you could copy and paste it into the command prompt to run it. Snippet: parser.add_argument("--iptc", type=bool, nargs="?", default=True, help="save IPTC data to output image (only for JPG)")
# ...
fn = os.path.join(sample_path, "seed_" + str(opt.seed) + "_" + f"{base_count:05}.{opt.format}")
Image.fromarray(x_sample.astype(np.uint8)).save(
fp = fn,
quality = 95,
)
if opt.iptc and opt.format == "jpg":
info = IPTCInfo(fn, force=True)
cmd = " ".join([f'"{x}"' if " " in x else x for x in sys.argv])
info["caption/abstract"] = f"{cmd}{f' --seed {opt.seed}' if '--seed' not in cmd else ''}"[:2000]
info["supplemental category"] = ["stable diffusion"]
info.save(options="overwrite") Result (as seen in Irfanview): I wonder if using this format (reproducing exactly how the parameters are used in the input of the program) would be better, because it would make it easier to reproduce the original image (instead of doing a lot of copy-and-paste and editing)... |
The reason I picked JSON is that I didn't expect people to use the data as a single atomic unit, but rather to pick and choose the interesting arguments in their own invocations of I also want to include the checkpoint # of the model (e.g., Finally, I wanted a format that was both human- and machine-readable. argv isn't really hard for a machine to parse, but JSON is a better format if you're thinking of it as a serialization format rather than a faithful representation of the exact command-line used to generate the image. All that said, I don't have really strong feelings about it. As long as the data is embedded in the output images, I'm happy. |
Funny, I've never heard of IPTC before. I think the idea of using json was that it was a near universal structured format that's human readable and can easily be imported/exported to anything in almost any language-- I guess XML is too old and wordy? Also I think there is a need to include the hash of the weights model, as well as information about the actual app/repository/script being used, including its location, that wouldn't be included necessarily and could break if you tried to use the same prompt on a forked but popular version of stable diffusion. Or vice-versa, if you wanted to use a specialized prompt upstream you would need to know that you cannot and moreover where to get the correct application. So you really want to be able to include as much about what went into making the image as you can, and that may include GPU type, app info, the specific version of the model, etc. Which isn't likely contained in the prompt arguments. I was originally thinking this could be done as a straight-up URI, but was convinced that would be too long and unwieldy. Anyway, I proposed an in-progress json schema here, just to get started. Cheers! |
True! Removing
Agreed! But not only the model, there are so many forks/versions of stable diffusion available that fully describing the env/config is difficult (and that's one of the reasons for me to include
In the end, JSON might be a better format for many reasons, even more if it could be directly used as the input parameters to the program: that would eliminate many of the risks of parameters exploitation. You could simply "load the image", either via GUI or command line, and add/change the parameters you want. So, adding this functionality (directly reading parameters from an image file) would also be important.
I think the worst part is HOW to embed the metadata: EXIF, IPTC and XMP are only a few of the standards, with varying degrees of support. I'm using IPTC just because my default image viewer supports it, but I'm not sure if it works as well with other programs / platforms. |
+1 and this is why sanitizing input will be super important...
This is true of anyone copy/pasting prompts, so the more you can clean up all input data, the better
The json schema being discussed includes a hash of and link to the model card/file, fwiw.
That's exactly one fo the goals-- to try to reduce as much attack surface as possible... but I do acknowlege the security challenges, no matter which method is used-- and it will be up to anything handling this json to be extraordinarily careful. |
Fixes #111. EXIF wasn't exactly the right scheme; rather, XMP is recognized across both PNG and JPEG. An open issue is that https://github.com/commonsmachinery/tinyxmp is currently buggy (which is why I'm using my own fork), and it doesn't have a pip installer (there's an open issue for that - commonsmachinery/tinyxmp#2).
But this seems to work, so I think it's a step forward to merge it. Thanks.