-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC-3: more dimensions for thee #239
base: main
Are you sure you want to change the base?
Conversation
Automated Review URLs |
This pull request has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/ome-ngff-update-postponing-transforms-previously-v0-5/95617/5 |
PS: I asked @joshmoore whether whimsy was allowed and he said yes, hence the title. (This comes after I realised I couldn't have "RFC-2: dimensional hullabaloo" because @normanrz had taken that number already. 😂) |
full endorsement. While i absolutely recognize the significant challenge that lifting the strict dimensionality model may pose for mapping arbitrary future usage onto legacy code bases that have been built around XYZCT, I fully agree that a true next-generation format is going to have to lift it. I have personally experienced a number of use cases and applications where the current restrictions have led me to delay adopting ngff in my own work, and this RFC would allow me to more enthusiastically consider adoption. I agree with @jni that concerns around communicating the semantics of specific axes (i.e. formally named "X", "Y" and "Z") are better addressed by additional keys in the axis metadata, such as |
For comparison, https://datatracker.ietf.org/doc/html/rfc2549 ("IP over Avian Carriers") |
Would you be able/willing to contribute those, perhaps even for a section in the RFC? |
Sure, the most direct stories I can share are from implementing writers for data coming off microscopes (code in pymmcore-plus/mda/handlers). There I essentially have a |
it's possible that @nclack and/or @aliddell would have opinions here as well, as I know they've spent a fair amount of time thinking about how to map a variety of custom experiment types to the ngff format in the acquire-python schema |
@tlambert03 thanks for the links! I'll add these to the background section, but could you point me to where in the code
would fail? The smoking gun would be:
Maybe it's not as easy as that to define these things compactly, but if it is, I think it would be worthwhile detail for this RFC's motivation. |
A few quick clarifications, @jni:
|
Re: NGFF readers: cc @manzt - https://github.com/hms-dbmi/vizarr - Any idea how much work it would be to support n-dimensional NGFF data? cc @dgault - https://github.com/ome/ZarrReader/ - Since the OME data model is very much 5D, this is going to take a bit of thought on how to handle n-dimensional NGFF data? |
The space restrictions, and all other axis restrictions (other than the requirement that axes have unique names) are removed in #235 |
Webknossos already supports an arbitrary number of dimensions. However, it assumes that there are only 3 space dimensions to map to xyz. I think the spec should provide guidance to visualization tools what to do with >3 space dimensions. |
This pull request has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/request-for-info-ngff-use-cases-that-are-currently-blocked/96641/1 |
As part of the [proposed implementation][implementation], Davis Bennett has | ||
created pydantic models that validate the proposed schema. These are actually | ||
new additions to the NGFF specification, surfaced pre-existing errors in the | ||
schema, and should prevent new errors from appearing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this text. Those pydantic models are merely a convenient way to write JSON schema. They don't express anything that's not already written in the prose of the spec. Also, I am planning on removing those models from the PR, because they add an undocumented build step that I don't have the energy to document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as of aa5c953 those models are gone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d-v-b I really loved the models! 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they will live here https://github.com/janeliascicomp/pydantic-ome-ngff if 0.5 comes out
The PR at #235 mentioned above seems to go a bit further than this RFC in that it removes restrictions on ordering of dimensions, whereas this proposal only mentions removing the restriction on the number of dimensions. I imagine that supporting arbitrary dimension order is a fair bit more work for implementers that n-dimensions, so that endorsement of this proposal may not signal endorsement of #235? |
regarding advice for partial implementations (e.g., implementations that only support a fixed number of dimensions, or a fixed order), I included the following section in the PR: https://github.com/ome/ngff/pull/235/files#diff-ffe6148e5d9f47acc4337bb319ed4503a810214933e51f5f3e46a227b10e3fcdR565-R580, please let me know if this guidance is sufficient or if we should say more (and lets have that conversation over in #235 instead of here, so that we can keep synchronized with the actual changes to the spec) |
|
||
## Overview | ||
|
||
OME-NGFF version 0.4 places severe restrictions on the number, names, and types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OME-NGFF version 0.4 places severe restrictions on the number, names, and types | |
OME-NGFF version 0.4 restricts the number, names, and types |
of axes that are allowed in the axes metadata. This has had the effect of | ||
limiting the datasets in proprietary formats that *can* be meaningfully | ||
converted to NGFF. It has also prevented some novel datasets from being written | ||
in NGFF format. This RFC only removes restrictions from the specification. An | ||
important consequence is that all valid NGFF datasets would remain valid after | ||
this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this formatted as a suggestion but the formatting got in the way. I think it might help to put some text in there like this:
"These restrictions hinder converting some datasets in proprietary formats to NGFF. The axis restrictions have also prevented some novel datasets from being written in the NGFF format. Clearly the data model articulated in version 0.4 of OME-NGFF is not inclusive enough for these datasets. So this RFC proposes removing these axis restrictions, thereby making OME-NGFF more open to the diversity of bioimaging datasets. Because this RFC only removes restrictions from the specification, all valid NGFF datasets remain valid after this change."
basically, the 2-5D restriction is a barrier to researchers who have data that doesn't fit into that model. so we remove the restriction, and make the ome-ngff model more expressive, so that those researchers can use the format.
These restrictions are actively preventing users from converting existing | ||
datasets to NGFF. For example, Zeiss .czi datasets [may contain][czi format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These restrictions are actively preventing users from converting existing | |
datasets to NGFF. For example, Zeiss .czi datasets [may contain][czi format | |
These restrictions are a barrier preventing users from converting existing | |
datasets to NGFF. For example, Zeiss .czi datasets [may contain][czi format |
I probably need to update the summary at the top, but under "proposal" I write:
If the names are arbitrary, the ordering must also be arbitrary, surely? But I can make it explicit. |
important consequence is that all valid NGFF datasets would remain valid after | ||
this change. | ||
|
||
## Background |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to mention here that as a historical fact OME-NGFF is an OME project, and the OME data model is 5D. This I think goes a long way to explaining the inertia for the 5D limit and the other various axis restrictions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I need to really get into it. I'm happy to change "unfortunately, 0.4 imposes restrictions..." to "unfortunately, for historical reasons, 0.4 imposes restrictions...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the word "unfortunate" fits here -- it's just an explanatory fact that this format was jump-started by OME, and they use a 5D model. This is literally the background for the 5D limit, so it might be helpful to put that here for people who don't know the history.
A draft proposal for [coordinate transformations][trafo spec] already includes | ||
most of the changes proposed here, so we envision that this RFC is compatible | ||
with future plans for the format. The proposal does currently limit the number | ||
of dimensions of type "space" to at most 3, but that limit [could be | ||
removed][space dims comment]. If this RFC is approved, the transformation | ||
specification would need to be updated to reflect this. However, that is an easy | ||
change and there seems to be sufficient support in the community for this idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to talk about that (stalled) PR at all? I don't see why it's relevant here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's relevant because it speaks to the forward compatibility of this RFC — ie it is in line with existing proposals for the format. That the PR is stalled is not really relevant — it's stalled because of minor details (e.g. array order) that don't have a bearing on this PR. Based on the discussion, other aspects, and certainly the ones relevant to this RFC, have quite broad consensus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case isn't it sufficient to just state that there are no known conflicts with other active proposals?
in my opinion the spec should leave this question undefined. The mapping can be direct (x=x, y=y, z=z), user defined (give users options of how to map axes), or arbitrary (x=foo, y=bar, z=baz). In practice, I think it is not going to be an issue, I am just wary of restricting the number for the same reason that I was wary of restricting the total number of dimensions, which indeed caused problems. If it helps this RFC move forward, I can bring back the "maximum three spatial dimensions" limit from #138, and we can have the discussion in a later RFC. The unlike the other changes in this RFC, the removal of the maximum number of "space" dimensions is purely speculative on my end, and not motivated from a concrete use case. Action requested:
|
Implementations that do not support some aspect of user data should clearly communicate that to users. Users can then decide which implementation use, given the data they have stored. We should not try to limit the data that users can store, simply because some implementations cannot represent that data. This is a broader issue: as of 0.4, there are lots of OME-NGFF tools that don't support big images on cloud storage (of which I have plenty). Should we change the spec to limit the size, or location of images, just because some implementations can't load my big ones? I don't think so. So for the same reason, we should not restrict what axes users have, just because some implementations are opinionated about axes. |
I am in favor of limiting spatial dimensions to 3 because even on an abstract level I have a hard time imagining what a fourth spatial dimensions would be. Additional axes can of course be present in the image (e.g. for phases, dim reduction etc). That would also be convienent for Webknossos because the 3 spatial dimensions would be used for 3d visualization and all other dimensions would be "slider" dimensions (i.e. we'll have a slider to select the coordinate). |
I can imagine scenarios where 4+ spatial dimensions might arise, e.g. imaging that is parametrized over the positions of two independent 2D raster scanners would produce images with 4 spatial axes, if you add a z-actuator then you are at 5. And anyone who wants to use OME-NGFF for visualizing the phase space of a dynamical system might have 4+ spatial dimensions. Broadly speaking, I think it's a mistake to prematurely limit what users can represent, especially because these restrictions are a burden on implementers who must write the validation logic. |
It could be a pure mathematical construct. Like I said in #138, maybe you want to represent a 4D Klein bottle for didactic purposes. Or maybe a string theorist wants to save some particle simulations in full 12-dimensional space. 🤷 I'm inclined to agree with @d-v-b's arguments:
I'll wait for a few more votes — I think it's indeed fine to postpone for a later RFC, and I do think getting the remaining parts of this proposal in is urgent for many applications. |
yeah, i think i agree with @d-v-b and @jni here. As an end-user, if I feel the need to express more than 3 spatial dimensions, I would be absolutely ok with viewer X (Webknossos, etc...) just saying "we only support 3 spatial dimensions, if you have more than three, we will pick three using the following heuristic: ...". |
We'd probably just pick the first three. That would be fine. Even better, if that is a recommendation in the spec. |
I'm in favour of removing as many restrictions as possible. Fewer restrictions generally leads to simpler standards. Why limit how many dimensions can be labelled "spatial"? It's not necessary, and so it shouldn't be done. Software may want to limit how many dimensions they can handle, makes code simpler. But we have plenty of toolboxes/libraries that can handle arrays with an arbitrary number of dimensions. So, the standard shouldn't limit the number. Instead, let the software that wants to have a limit on the number of dimensions simply refuse to read files with too many dimensions. Don't standardize on the minimum common denominator. |
I have a similar question as @will-moore as what is the exact change to the specification with this RFC. Is it exactly the same as #235? If so, I think the RFC should be more explicit about its implications. |
@ziw-liu I'm not the author of this RFC, but as the author of #235 I can say that in that PR there are no restrictions on the So, if applications previously relied on the
The above example has two axes that spatial, but they use a different Does this summarize your concern? Because if so, I agree with this concern but I think the actual problem is the I would welcome discussion in #215 on what the axis I am happy to amend #235 to add recommendations for the |
That is the intent.
I disagree with @d-v-b and have commented on #215, but to record the objection here and keep a semi-complete record of discussion on the RFC PR itself: a unit does not unambiguously specify a type, for example "wavelength" and "space" are both measured in meters, to say nothing of "stage position", for example. Therefore, I suggest that we use SHOULD as guidance for the special cases of "space", "time", and "channel". But I don't want to use MUST here because, as mentioned in the discussion above, I think it's ok for software to not support all ome-ngff files. For example, I think ome-ngff should be usable to store Fourier-transformed data (type "spatial-frequency", and data type complex64/complex128), but many viewers won't be able to work with that immediately or ever, and I think that's ok. |
Thanks @d-v-b and @jni. For me something like this would be useful:
I would also appreciate a similar recommendation for visualization implementations about choosing 'special dimensions' (e.g. first or last 1 |
but they shift the burden to implementations which then reduces interoperability because loose specs have ambiguities. I am not against removing restrictions but then there should be strict guidance on what implementations must do so probably not use SHOULD but MUST. |
Personally I would benefit from some concrete examples of implementations that rely on the current axis restrictions, so that we can better appreciate what impact these proposed changes would have, and potentially how to mitigate those impacts. |
@d-v-b My own implementation relies on the order t, c, z, y, x. This has the advantage of having clear semantics and this is not just useful for visualization but also for computing. What I worry about is that by making the specs too flexible, this ends up with different implementations eventually leading to variant formats just like happened with TIFF. I'd prefer specs that cover 90% of the use cases in a clear and unambiguous way. But maybe that's an issue of scope, i.e. what is NGFF supposed to cover? I haven't encountered microscopy images that don't fit the current dimension pattern so maybe having concrete examples (e.g. from papers) of microscopy images with more than 5 axes would be useful to understand the need and help reason about it. |
Our implementation of Webknossos expects that there are 2-3 space axes. We don't rely on ordering and all other axes can be arbitrary. |
@d-v-b Here's a list of implementations I'm aware of that rely on current axis restrictions. In general these tools handle 2D data (or a stack of 2D planes) so they expect the last 2 dimensions to be
|
i disagree that allowing flexible axes will cause fragmentation. I think it's the opposite actually. @jkh1, if your use case relies on a strict TCZYX model, then your primary concern should be whether, given a dataset, you can unambiguously find those 5 axes in the dataset (and then, you can transpose them as needed to fit your required dimension ordering). not whether the specification technically allows for someone else to do something you're not interested in doing. (It's the restriction of those other use cases that causes fragmentation) I absolutely agree (and I think we all do?) that inasmuch as a dataset does have a standard 3-space + 1-time + 1-channel dimensional model, then the spec should make it unambiguous how to find that. I think it already does that. |
I agree that use cases not covered will use something else hence also lead to fragmentation but I think preserving interoperability for the greater number of cases should have priority. I am not against changing the model, my primary concern is about preserving the unambiguous semantics of the axes when this is done and maybe the issue is that it isn't clear (at least to me) that this will be the case. The new specs should include the current one as valid subset and also be semantically unambiguous. This probably means standardizing the vocabulary and defining what implementations should do with what they don't understand. |
Here is a quick rundown of imaging modalities that I think would have trouble fitting into the
The new spec does include the current one as a valid subset, and I would argue that the current spec is actually semantically very ambiguous, because it doesn't define what the different types of axes mean. I think contributions to improve this would be welcome. |
another important example, on the lower end of the dimensionality spectrum: the output of a line-scanning camera is a 1 dimensional array. As OME-NGFF 0.4 cannot represent 1D data, the format cannot represent a single frame of a line-scanning camera image. NGFF users with such data would have to pad their data with fake dimensions, which is data mangling and a very bad user experience. |
Another example I've been working with lately: electron backscatter diffraction (EBSD), which stores a 2D diffraction pattern for each pixel of a material. And the summarised data is still xy+(3D angle) or xy+(quaternion). (The latter could stored as "channel" but that's a little bit of an abuse of the spec, imho.) My reading of the discussion above is "loosen restrictions, but offer guidance with SHOULD as to how tczyx SHOULD be stored". imho, though, we should drop the order requirement — it is not hard to transpose an array, and there are good reasons (e.g. performance) why e.g. during acquisition, one might want to store bytes in TZCYX order. Of course, that could be done in some "transitional" format, but I think it would be super nice for everyone if that was also a valid OME-NGFF file! |
Strongly agree that order shouldn't matter. It's trivial to create transposed views of when reading |
This pull request has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/python-bioformats-not-able-to-correctly-open-an-image/96600/14 |
Hi all, very important discussion. For us (from a microscope vendor perspective) the old TCZYX 5D approach is one of the biggest limitation already and obviously we hope that OME-NGFF will allow for arbitrary dimensions, that can be identifies and read easily. If this will not be the case, there will be than again various workarounds etc. will will in my opinion defeat the idea behind OME-NGFF. For us this would again mean, the OME-NGFF is just another format, where we need to figure out how to convert our CZI format to. But if it supports many dimensions etc., this becomes a valid option to be used even for vendors. |
Thanks for the input @sebi06! Would you be happy to be listed as an endorser of the RFC? (If so, please 👍 the original post at the top.) I use libczi as an example in the RFC, but if it were endorsed by you directly that example would probably hold more weight. 😊 |
To make sure we have consensus, I'm opening this RFC in the style of RFC-2. (I'm aware that RFC-1 has some pending issues to be resolved, but when consensus is possible 🤞 this is a useful way to document the history of past decisions.) Please add a thumbs up if you want to be listed as an endorser. Please reply if you have concerns.
@d-v-b @joshmoore @normanrz @bogovicj @will-moore @ziw-liu @tlambert03
Please add pings for authors of libraries implementing ome-ngff readers and writers, as the main effect here is not on existing files but on implementations that may implement too-restrictive a spec.
My goal is to get this and #235 merged before the upcoming 0.5 release. 🤞 (I think that is being targeted for late June/early July? @joshmoore?)
Review URL: https://ngff--239.org.readthedocs.build/rfc/3/