-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Structured metadata #266
Comments
Perhaps |
I'd love to include the current commit, but unfortunately a lot of users get the source by downloading the zip file from github, which does not include the hash anywhere, as far as I can tell (in particular it doesn't include the I guess it could be an optional field to be included if the code generating the metadata can reasonably figure it out, though. |
Thought: Could this be coordinated with the URI definition standard request I proposed here? So that attribute names match, etc.? And how might they work together? Would a URI point to the image and a second argument produce this structured metadata? |
Also, fwiw, the single-file, cross-platform PyQt5-based SD GUI I was contributing to already saves its settings in json, but I was realizing that the json settings actually define the image itself in a way.... Dunno if this is something to build on for a proof-of-concept as it's super easy to add to. |
@fat-tire URIs are inherently somewhat unsuited for structured data, and a full specification for a SD image (when you take into account stuff like variations and grids) is inherently structured. So if you want to do a URI, I think it would be best to have only a single key-value pair, where the value is JSON in the format of the spec proposed here. Then you don't need to try to coordinate two different formats for this specification. |
Ah, right, didn't think about that. Optional field sounds good - and now that I think about it, if we are including a commit reference, we ought to include a branch reference as well (or does a commit imply a specific branch? I don't know). |
This is good, except it makes the URI super long (is the json value in your keypair further encoded/compressed in some way?) and is effectively a wrapper around the json. I'm wondering if there's an abbreviated but human-readable and easily-edited format that could be used to reference an image(s) resource? Like imagine an image browser for SD-- if the complete json would show up in View Source, what would go in the URL/Address bar? On a reddit post featuring some cool approach to generating images, you wouldn't attach a json file, but what if you could put a short self-contained link that could be copy/pasted or tweaked by hand by any non-technical person-- what would it ideally look like? Or say I wanted a single text file or google doc full of accumulated copy/pasted references images, say, for some art project-- what's the smallest, one-line-per-image (for example) way to do collect them? What might fit in a small QR code? These are the types of use cases I'm imagining. I feel like a structured json file, even though it was designed to be interpreted by humans, may be too large and unreadable for non-programmers to easily understand and make changes simply and quickly and that a compressed URI with ¶m=value and sane defaults is more familiar. Maybe I'm talkin' crazy tho, dunno. |
Eh, they don't get that long unless you have a lot of data, and then, well, it is actually long. The data isn't compressed but there's simply not that much of it. If you're really worried about length we could make some of the fields optional and specify default values - e.g.
which is pretty much as human-readable as a URL would be. More, arguably. There's not really a lot of overhead from the JSON format itself relative to URI-style I think copy-pasting things like the above is at least as easy as copy-pasting a URI, and has the benefit of not confusing people; URIs look like you should be able to point your browser at them, and that's not the case here. Plus URI encoding for spaces - which will come up a lot, because every prompt has spaces in it - is a huge pain for humans to deal with. |
What you've said in principal makes sense to me, especially w/regard to %20 throwing people, but yes, sensible fallback defaults for omitted fields should be part of this standard to make it compact and easy for normal use by non-techies. However, as I think more about this, any defaults would need to be standard to a particular scenario-- ie,, they'd have to be "known" for consistency across apps supporting this metadata structure to agree to all fallback to defaults, and in the same way. But how would everyone know that say with the v4 model, "512x512 is the agreed-upon size default, so always assume that"? Is there some rule, like "the default size for any model is always the shape of the training images"? That seems too rigid. Who decides the defaults, and where would "sensible defaults" be published that anyone writing an application would know to find them? Along the lines of defaults, I wonder if some fields should be deemed "required" vs "optional" or something-- the required ones would contain the bare minimum needed to generate an image-- people can then build out from there with greater specificity. Also, you mention that height/width could be determined from the image (I assume, as this json is metadata embedded within an image) or when offered alongside it-- but if the json metadata alone is defined as sufficient to produce an image from scratch, I'd think you need to include width/height in the metadata. ie, don't assume anything should be inferred from the image, as an image may not be included or may have been resized or screenshotted or whatever (I acknowledge, I've expanded the use case beyond metadata in the file as originally conceived, but if this is a format also meant for copy/pasting in forums or via other human-means, it needs to be complete) Also-- aside from the hash of the weights file, don't you want to have something to indicate about which version of a model is to be (or was) used, which at the very least would be helpful in providing feedback to the end user when they don't match or when a model is missing? Example-- SD is about to release v1-5 right? The model card already contains this metadata in the form of the Other random unformed ideas/thoughts:
Sorry there's a lot here and apologies if some of these don't quite relate to the lstein fork-- I'm only just getting familiar with the main repo. I don't wanna get to big and unweildy either, but I think some basics like H&W and model name/version have to be there if it's going to be used as a simple text-based generate-from-scratch "trading card", useful for learning, sharing, research, etc. And I guess once everything is hammered out, if this turns out to be too heavy a way to do this, someone can always "URI-ize" it, especially once the field names are deemed stable and a good standard. Anyway, this is pretty exciting stuff! Thanks. ft |
Yeah, I was imagining only certain field could be omitted. As to how people would know how to interpret missing fields, we'd write it down. So for examples you could omit
Yeah that's a good point. I'll update the OP to add those as fields which are optional only when the metadata is embedded in an image file from which it is possible to derive those values.
The hash of the weights file is sufficient to uniquely determine the model. For the sake of giving users helpful feedback, rather than just "the weights file you have doesn't match", it would be kind of nice to also have a version to report, but we don't necessarily know that - for example, the installation instructions for this repo suggest putting the weights at That said, I do think that having a list of known hashes would be helpful. That wouldn't be part of the metadata spec per se, though it might live along side it, and tools could hardcode that list to give more useful feedback when they see a hash they know.
I'm fine with either name. I don't just want to use the mode card ID because the point of this field is to distinguish this repo from others. Including the commit hash or branch name would be nice, but as discussed in the comments above it's often not possible.
Yes.
At least in this repo, there are not any individual output images. There's just the grid.
Per above answer, there are no such images. But if there were, the answer to both of these would be no.
For all of this stuff, I don't think it belongs in this spec - this is a specification specifically for the metadata for images to tell you the settings used to generate the image. If you want to include other information alongside the image, put it somewhere else. E.g., wrap the data from this spec: so That way we don't have to keep a registry of additional optional fields, which in my experience never works, and all of the fields in this spec can be automatically derived, which is important. |
That makes sense, so long as everyone who is implementing this standard can agree on what the fallback defaults are.
Yeah, but it's only practical for confirming that the model you have is valid, which can be determined via other methods (usually whoever provides the model in the first place will offer a hash). Wouldn't it make more sense to name the model, version, and creator (the
To me, that seems backwards-- the metadata should point to the model used, not the hash of the model- then you don't need any table that has to be maintained and updated (see below re generic At the very least I'd expect a
That's why I'm suggesting using To me, only providing the mdoel hash is a bit like say "to bake this cake, go to the store and buy the one ingredient they have in a red box that weighs 12 oz and costs exactly $13.54". Okay, I guess if I had a list of the store's inventory with associated prices I could find it, but it would have to be an always maintained, up-to-date list. Why not just tell me the exact ingredient I need, so I can ask for it directly? (Yes, as a safety, I can verify the product with the price and weight, but that's not how I want to look for it.) And going back to the error message for the user, it's a lot clearer to say "To bake this cake you need a 12oz bag of Whitman's Quality Flour." vs "Sorry, you are missing a product in a 12oz red box that costs $13.54." and cross-check that with a hopefully current list of all possible ingredients.
But wouldn't a variant called
I've not used lstein, but upstream when you create a grid with say, 4 images, you also specify the number of rows (which has a default) and you get 5 images back-- a "grid" image containing the four images arranged in, well, a grid, and the 4 individual images in the
Okay- sounds like someone took out the child images in the lstein fork... fwiw, if it's to be compatible with the upstream repository- might want the child images to look like any other generated individual image. Having an
Two more thoughts then, for possible expansion at a later time:
|
That information simply isn't available for most users, is the problem. Users just download a weights file and use it. So the hash of the weights file is literally the only information we can use here.
Yes, like I said I'm fine with calling this field "variant" instead of "fork". If you are asking for some other difference from what I've proposed, I don't know what it is you're asking for. (We can't use the commit because that information often isn't available; I'd include it if I could, and I think I probably will add an optional field to store that.)
Interesting thought. I'm fine with it because I'm not worried about extra space, but if we're trying to minimize fields it's not strictly necessary - we could just say that any new versions will add a new field. (I'd call it
I actually just modified the spec to include an |
For that use case, a hash makes sense if you somehow need to reference an otherwise unknown model file. But looking forward-- there is a near-universal, industry-standard way of uniquely identifying a specific model, adopted across all disciplines of ui-- the Any retrained models stemming from a single architecture would have its own name in most cases. If not for some reason, then fallback to the hash. Also, when you say "users just download a weights file and use it"... how so? Presently, people are sophisticated enough to download stable-diffusion or a GAN or whatever but somehow have no idea what model they are using or where it originated? I get that right now sd has a single place to put a model named model.ckpt or whatever, but that model got put there by someone who had to understand where it came from and what the license was, etc. Using unknown binaries (even as a model) isn't a very good idea, generally speaking.
I'd suggest calling it For a specific codebase, maybe use This schema needs to account for both the codebase and the model, both of which have (a) a name, (b), a source/author, (c) a version/tag Additionally, both can have hashes/signatures and I guess you want to track it for the model as you're accounting for scenarios in which a model's (a), (b), and (c) are all uknown but you still want to build the image from "scratch" how about:
Well a commit is only a single change to the repository, and a single commit might be on multiple branches anyway. But a tagged branch is usually not meant to change-- adding
Yeah I didn't mean literally
Sounds good-- you're underlining my primary goal-- to have something short and sweet to paste in something like reddit or wherever. Imagine the stringified json object had a name, and for lack of a better term I'll just call it So imagine a post that says "See this cool picture? if you paste this It would parse the data, and with a tap of a button start popping out the image(s) in as reproducible a way as possible. Then maybe you just made the image but think it could be better. So you make a tweak to the prompt and get a great result. You'd hit the "Copy Similarly, someone sends you a cool image and want to know what exactly went into making it, you'd load in that image, hit the But as time passes, it may be the case that you grab someone's Again, hope this is all making sense. |
Sure, I'm happy to have this as an additional, optional field, though no existing repository will be able to use it because no existing repository has the model card. (I guess I could, and probably should, hardcode specific hashes and their corresponding model cards, though.)
This is an empirical description of the way people are currently using stable diffusion. They download the weights and use that. They are not currently in the habit of additionally downloading a model card, nor would I want to complicate the setup instructions by requiring that they do so. It's not that people couldn't download this information alongside the weights, it's that right now they don't, and I really don't want to add additional setup burden. Were it me I'd've embedded this information in the checkpoint file, but as far as I'm aware this is not currently done.
That mostly sounds reasonable - with the caveat that
A commit hash unambiguously refers to a single state of the code, which is the important part. Neither branches nor tags have that property, so I don't much care about them. But sure, I am fine adding
Yup, I think we're mostly on the same page. I'll update the OP later today. |
OK, updated the spec in the OP. @fat-tire want to take another look? |
Sorry I'm a little confused-- I meant that the model card is just for humans to read to associate the model with an id-- so either it's bundled with an sd application, or the user has explicitly installed it, or the app has noted a specific model as a named requirement for recreating an image from its metadata.
Oh of course not. They wouldn't need the actual card- but like you said, if there's a mystery .ckpt there, they could verify they have the "right one" for a-- (sigh)
Oh I never meant to suggest they would have to...
But an app would have to know that it's incapable of supporting a particular model. I would suggest that model_hash would be optional if model_id is provided- but I guess it can't hurt to have a hash of the model for verification that it's the right one.
You mean for model_id right? the URL though would be needed to distinguish between "bob/mymodel" on github and "bob/mymodel" on gitlab or "bob/mymodel, tag release-2" on bitbucket or huggingface, or wherever else... I don't see how a URI wouldn't be indespensible. It even directs you directly to the specific tag and would be a virtual requirement for someone building from scratch who doesn't know the model or the ai community to know where to find the specific model required. (a direction to a commit hash would be good as well.)
Cool thanks-- yeah it seems we're going in the same direction. Hopefully others will chime in as there may be uses cases or scenarios neither of us have contemplated. Update-- took a look at the spec now-- looks great! My only holdouts are about the URI and possible confusion between github/gitlab/gitea/huggingface/etc. A URI pointing to a hash would clear up any ambiguity and offer clear direction for a user or automated process searching for the correct model or app w/o having to download and then check hashes. Also to throw a wrench in this-- we're assuming cross platform support for the apps. I think we SHOULD care that we have the correct app, say some experimental new feature is supported here, but what if this won't work on my platform? What should happen in this case? Or maybe there's a Mac version of this windows program that WOULD work... what then? Maybe you're SOL- same as if you don't have enough memory or the right graphics card or whatever... or no? again, nice work. Don't hate me-- I'm just playiing devil's advocate here :) |
I'm thinking about how an application like this one would populate the metadata, not how a user would consume it. Right now, there is no reasonable way for an application like this to populate I'm fine with having
Making
No, I mean the I guess I am OK with having an extra Of course if your project is hosted somewhere other than GitHub you can put a full URL in the
I don't think we're assuming cross platform support, really? We're just saying "here's how this was generated". Nothing is stopping you from inputting the same settings into a different application; you're just not guaranteed to get the same output.
Not to worry: I work on a standards committee; I am extremely used to working with this kind of feedback. And it's helpful to getting the best version of the spec. Doing it before finalizing the spec is the best time for that! |
Well, for me this started when I was contributing to this qt-based GUI repo, and when looking at the settings file I was like- wait everything currently in the settings would be all you needed to define the image. It was already being saved as json and I was like-- this- this in some way IS the image and the user could effectively swap out settings files as different images and trade them... I thought a URL-encoded version might be simpler, but regardless, that's where I started. But if the-- and I'm going to stop calling it an
The first thing I plan to do once this is settled as a working standard is to implement it in that GUI, so I'll be using the model_id probably always, in concert with the hash. It can be populated by hand or as you suggest from a lookup table, or derived from checking the URI, etc.
Okay.
As long as there is a URI to clarify the actual source, to make it easy to find or download, that's fine.
Okay, don't require it, but I do think it will prove to be extremely useful, especially as models splinter and are retrained, etc.
That's fair enough.
I agree! |
I turn my back for a few days and this thread has grown to 18 comments! Is the current RFC still the first posting, or is it an external file somewhere? It might be good to put it into the repository so that we can track version changes. Or even a Google doc. |
Agreed, given the (seeming; I don't have time to read it all 😁) length and breadth of the discussion in here. |
I've been editing the original message; it's current with what I am proposing. You can look at the revision history if you want but it's not very exciting. I might need to tweak it a couple more times - once to accommodate a richer When I submit the PR implementing this format for metadata I'll include the spec as a markdown file and link to it from the readme. Don't want to formalize it before then because very few specs survive contact with first implementation, in my experience; I want to make sure it's at least possible to implement before I declare it good. |
@bakkot; I'm sure that you know this, but if we also have a JSON schema, it makes everyone's life much easier (if nothing else, the schema IS the spec) |
I'll be sure to provide a JSON schema also. |
@fat-tire; I hope I don't come across as rude, but "See this cool picture? if you paste this ailink in your GUI too you'll get the image I made" is pixie dust. This fork is several hundred commits ahead of upstream (which btw for all intents and purposes is dead: in the last 2-ish weeks, it's had - README updates - fixes to deps - safety tweaks (ew) - a license change) - there are two other forks which are "leading" forks. This one is No. 2 in terms of stars and forks, but has twice the commits of No. 1. I say all that to say: no WAY is someone going to be able to take unedited generation parameters straight from here, and run then through one of the other forks and get anything like the same image (this repo itself has issues with reproducibility on macOS!). I don't mean to say that reproducibility isn't a goal, or should be ignored; more that it doesn't really exist now (across forks), and I don't believe that anyone is coordinating anything like it (across forks) |
@bakkot you're still going to add a
@tildebyte Of course no one is expecting you to be able to do that-- that is the very reason for providing the optional Incidentally, was trying to come up with a good name to replace 'ailink':
Meh. I'll keep thinking. |
I actually think it's feasible, believe it or not! This fork does have tons of features, but most of them are optional, and we've been pretty good about not making "breaking changes" in the sense of "changing the output for a previously-working prompt"; if you aren't doing anything "fancy" like upscaling or variations, you probably actually can get images to reproduce on other forks. Indeed part of the point of this spec was the hope that other forks would implement the same format and support loading metadata from other applications, when the metadata indicated the image was generated from the subset of features which they support. |
Ah, right. I've just added |
@bakkot did not include the filename for privacy reasons and instead opted for a hash: #266 (comment) But that does not allow for metadata to be used to reproduce img2img images. You'd have to figure out which image was hashed. Maybe I'm missing something but that makes metadata worthless for img2img unless you remember which image you used. IMO a filename and hash should be used. Privacy concerns are handled by the implementation e.g.t the back end copies init images to wherever they need to be and, if privacy mode is enabled, changes the filename to a UUID and strips metadata from them. |
Maybe a URL instead of a filename or path? Even if it's |
Perhaps we have |
Since I was also just thinking that a nice thing about the node-based pipeline description of how to generate an image is that if img2img is used with an image which in turn contains metadata about how it was generated, that image could theoretically be imported and hooked into the graph to be recreated from its metadata too. |
* Feature complete for #266 with exception of several small deviations: 1. initial image and model weight hashes use full sha256 hash rather than first 8 digits 2. Initialization parameters for post-processing steps not provided 3. Uses top-level "images" tags for both a single image and a grid of images. This change was suggested in a comment. * Added scripts/sd_metadata.py to retrieve and print metadata from PNG files * New ldm.dream.args.Args class is a namespace like object which holds all defaults and can be modified during exection to hold current settings. * Modified dream.py and server.py to accommodate Args class.
I'd like to question |
I agree. Also not clear on the purpose of The Lastly I'd like to voice my concerns about putting file paths of any nature into metadata. Privacy aside this is too unreliable a feature in my opinion. If we allow (and I think we should) these images to be shared across the community the paths (both relative and absolute) can change arbitrarily but it shouldn't impede reproducibility of an image in any case. The original spec already has |
We may want to consider including a handful of fields in the file metadata (separately from this) to indicate metadata spec version and format. I can imagine we'll eventually want to gzip or otherwise compress the metadata (if not come up with a binary format). |
The PNG format where we're sticking this metadata already supports gzip'd text in (We almost certainly do not want to come up with a binary format.) |
Ok, so suppose the client has the responsibility of keeping track of init images. Aren't we, in practice, deferring a part of the spec (association of init image to result image) to client implementation? Scenario 1: I have generated a lot of images via SD img2img. My computer crashes but thankfully I had a data backup. I reinstall whatever software I used to create the images. How does my software figure out which init images go with which results?
Scenario 2: I have generated a lot of images via SD img2img. A new, vastly improved UI is created and I want to migrate to it. How does that happen?
Scenario 3: I have generated a lot of images via SD img2img. My friend wants to iterate on my work. They don't use the same software I use. How do I send them my best results and include the init images?
Scenario 4: A new img2img method is invented in which an arbitrary number of init images are provided and you get an cool mix of all of them. How do we indicate which images are used?
I understand that not including a reference to the init image besides a hash may be "correct", but I don't think it's functional. We're not building a metadata spec to be correct, we're building it to be used in the real world, right? |
After making a tea and doing some testing, I think embedding the init images as base64 is probably a good enough solution. I embedded 50x base64 images in a PNG's metadata without issues writing or reading the data back. The PNG is now 34 MB but well that comes with the territory. Edit: according to this official-looking website, the max chunk length is a Very Large Number™️. So we can almost fit an abritrary number of init images. http://www.libpng.org/pub/png/spec/1.2/PNG-Structure.html |
To save space, perhaps only include the images in leaf/terminal nodes as any intermediate images (flipped, rotated, combined, tiled, etc.) should be able to be derived from those, right? |
@psychedelicious I'm all for embedding an arbitrary number of images inside metadata as base64 if they all are needed to regenerate the image. You are going to need them anyway, whether they come packed into one file or several. Sharing/uploading just one file is easier in terms of general UX. As a side note: why was this marked as completed? Doesn't feel like a conclusive solution was achieved? |
It may be a good idea to provide a way to get the images with the metadata stripped out, especially if they're significantly larger because of it. |
The client could handle exporting an image without metadata e.g. "Share Image" vs "Share Image with SD Metadata" So when init images (or masks or anything else that is invented) get embedded, we will need to strip them of their metadata, else when you chain img2img's, you end up with massive metadata. This goes back to considering init images and any other input to the current working image as atoms which come with no context of their own. Hope that makes sense |
I appear to have a "reopen" button that works. I have used it. This must have been closed by mistake, @lstein was doing some out of season spring cleaning. |
I was thinking-- so we're going to embed required images but not, say, embed the actual weights file, right? (of course not) Since multiple images can share a single weights file, it's reasonable that maybe one init image will be shared between images too. There's a value in including that init file as a "standalone" image, but if you're grabbing 50 images that all share the same init image, you don't want the size of that image repeated 50x. So maybe the notion of "static" vs "shared" (as in libraries) might be applicable... just to make things simpler (or maybe more confusing). Maybe to manage such scenarios, have something like;
A resource would have: I'm probably missing something, and as always it's important to consider security implications of throwing a big blob in there. I figured call this In fact, the weights file itself can be seen as a resource-- not that you'd shove that 10G file in an image-- but instead of doing this:
You could just make it a resource :
One advantage to this is that for a pipeline with nodes, you might be using several models-- stable-diffusion to generate the image, then ESRGAN or something else to do more processing. Anyway, this is just typing out loud, so maybe none of this is good... dunno. Maybe we're trying to do too much all at once. But it can't hurt to think a few steps ahead about what may be possible so that it's not THAT hard to redo later. |
I can't know what will be invented in the future but at least in regards to img2img I don't expect the size of the generated PNG to be sufficiently large (unless upscaled). An image that is the result of a chain of 100 img2img generations still needs to embed only the 99th image because that is the only one that is needed to regenerate it. I don't propose to store all of the chain in the |
Yeah we are suggesting the same thing here. I brought it up in reference to a past conversation somewhere on this repo in wihch this same question was raised i.e. if we embed/store an init image as metadata for a result image, should we store that init image's metadata. |
I am going crazy. I cannot see this discussion in GitHub GUI. The only way I can find it is to manually type the full URL. I've also tried to pin it, but it doesn't show up. Does someone understand what's going on here? |
I have a proposal for a spec for metadata, laying out goals and a formal spec.
I'm happy to implement this if there's buy-in.
Thoughts?
RFC: Structured metadata
Currently when generating images from the CLI (but not the web), metadata for that is stored as a string kind-of corresponding to the prompt. That metadata is enough to reproduce the original image... sometimes.
I'd like to:
To that end, I'd like to propose the following spec for metadata.
In this doc, "hash" means "the first 8 characters of the hex-encoded sha-256".
Data location
Metadata is a JSON string following the "top-level data" schema, stored in an uncompressed PNG
tEXt
oriTXt
chunk named "sd-metadata". (This corresponds to what PIL does already when adding text data - it will choosetEXt
oriTXt
depending on whether it contains non-latin-1 characters. I just figure it's worth writing this down.)Top-level data
The top-level metadata should have the following fields:
model
: "stable diffusion"model_id
: string identifying the model. must by themodel_id
field of a Model card. Optional; there is no default value, but consuming applications may infer a value frommodel_hash
if they recognize that value.model_url
: a string giving a URL where the model can be downloaded (if public) or read about (if not). Optional, does not have a default.model_hash
: hash of the weights [precise format TBD depending on implementation feasibility]; see the "model information" section belowapp_id
: a string identifying the application consuming the model. It is recommended, but not required, that applications hosted on GitHub use the username/repo_name of the repository in this field; for example, the fork we're on would use lstein/stable-diffusion.app_version
: a string giving the version of the app fromapp_id
. It is recommended, but not required, that projects with numbered versions use a string of the formv1.0
, and that projects built from git repos use the short-form git hash of the commit. Optional, defaults to "unknown".app_url
: a string giving the canonical location of the application on the web. Optional, does not have a default.embeddings_hashes
: an an array of the hashes of any textual-inversion embeddings in use. Optional, defaults to an empty array.arch
: "cuda", "MPS", or another helpful value indicating the GPU architecture. Optional, defaults to "unknown".grid
: a boolean, whether this was a grid. Optional, defaults tofalse
.metadata_version
: the string "1.0". Optional, defaults to "1.0". Breaking changes to this metadata format should update this field.and then also one of the following two fields, depending on whether this is a
grid
:image
: an object in one of the formats specified belowimages
: an array of such objectsImage data
Every image has the following fields:
type
: either "txt2img" or "img2img"postprocessing
: eithernull
, indicating no postprocessing was done, or an arbitrary object representing the postprocessing performed. Spec for this will depend on individual postprocessors, but I'll write something up for the ones we support. Optional, defaults tonull
.sampler
: one of these samplersprompt
: a nonempty array of{ prompt: string, weight: number }
pairs. The single-prompt case is[{ prompt: prompt, wieght: 1 }]
seed
: a seedvariations
: an array of{ seed: number, weight: number }
pairs used to generate variations. Optional, defaults to an empty array.steps
: the number of steps configured to be takencfg_scale
: the unconditional guidance scalestep_number
: the number of steps actually taken. Normally this will be the full number of steps, but for intermediate images it may be less. Optional, defaults tosteps
(orstrength_steps
in the case of img2img).width
: the specified width (as a number of pixels). Optional only when this metadata is embedded in am image whose width is the same as this value would be, in which case it defaults to that image's width.height
: the specified height (as a number of pixels). Optional only when this metadata is embedded in am image whose height is the same as this value would be, in which case it defaults to that image's height.extra
: an object containing any necessary additional information to generate this image. Not to be used for other data, like contact information. Optional, defaults to the empty object.Images of type
img2img
also have the following fields:orig_hash
: hash of the input imagestrength_steps
: the configured strength for runningimg2img
(as an integer; as discussed here, that's what it actually is).Height/width are not stored since you can infer those from the file.
Thoughts on storing the model information
I am proposing to store a hash of the loaded model, which is a lot faster than reading the file from disk a second time, but the hash correspond to the file on disk. Better than nothing, though.
Is it worth also storing a hash of the model config? I don't think so, since you're always going to need the original config for a given model weights file.
The text was updated successfully, but these errors were encountered: