Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-3: more dimensions for thee #239

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

RFC-3: more dimensions for thee #239

wants to merge 4 commits into from

Conversation

jni
Copy link

@jni jni commented May 21, 2024

To make sure we have consensus, I'm opening this RFC in the style of RFC-2. (I'm aware that RFC-1 has some pending issues to be resolved, but when consensus is possible 🤞 this is a useful way to document the history of past decisions.) Please add a thumbs up if you want to be listed as an endorser. Please reply if you have concerns.

@d-v-b @joshmoore @normanrz @bogovicj @will-moore @ziw-liu @tlambert03

Please add pings for authors of libraries implementing ome-ngff readers and writers, as the main effect here is not on existing files but on implementations that may implement too-restrictive a spec.

My goal is to get this and #235 merged before the upcoming 0.5 release. 🤞 (I think that is being targeted for late June/early July? @joshmoore?)

Review URL: https://ngff--239.org.readthedocs.build/rfc/3/

Copy link
Contributor

github-actions bot commented May 21, 2024

Automated Review URLs

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ome-ngff-update-postponing-transforms-previously-v0-5/95617/5

@jni
Copy link
Author

jni commented May 21, 2024

PS: I asked @joshmoore whether whimsy was allowed and he said yes, hence the title. (This comes after I realised I couldn't have "RFC-2: dimensional hullabaloo" because @normanrz had taken that number already. 😂)

@jni jni changed the title RFC-3: More dimensions for thee RFC-3: more dimensions for thee May 21, 2024
@tlambert03
Copy link

full endorsement. While i absolutely recognize the significant challenge that lifting the strict dimensionality model may pose for mapping arbitrary future usage onto legacy code bases that have been built around XYZCT, I fully agree that a true next-generation format is going to have to lift it. I have personally experienced a number of use cases and applications where the current restrictions have led me to delay adopting ngff in my own work, and this RFC would allow me to more enthusiastically consider adoption.

I agree with @jni that concerns around communicating the semantics of specific axes (i.e. formally named "X", "Y" and "Z") are better addressed by additional keys in the axis metadata, such as "type" and "space".

@joshmoore
Copy link
Member

jni commented 4 minutes ago
I asked @joshmoore whether whimsy was allowed and he said yes, hence the title.

For comparison, https://datatracker.ietf.org/doc/html/rfc2549 ("IP over Avian Carriers")

@joshmoore
Copy link
Member

tlambert03 commented 1 minute ago
I have personally experienced a number of use cases and applications where the current restrictions have led me to delay adopting ngff in my own work, and this RFC would allow me to more enthusiastically consider adoption.

Would you be able/willing to contribute those, perhaps even for a section in the RFC?

@tlambert03
Copy link

tlambert03 commented May 21, 2024

Would you be able/willing to contribute those, perhaps even for a section in the RFC?

Sure, the most direct stories I can share are from implementing writers for data coming off microscopes (code in pymmcore-plus/mda/handlers). There I essentially have a class OMEZarrWriter(_5DWriterBase) that is able to accept acquisition definitions that explicitly adhere to the 5D model, but which will simply fail otherwise. As a result, I find myself using a more general zarr format more often, and have kind of punted for now on outputting data to something that I know users will be able to open in a variety of downstream applications (not that I love that either)

@tlambert03
Copy link

it's possible that @nclack and/or @aliddell would have opinions here as well, as I know they've spent a fair amount of time thinking about how to map a variety of custom experiment types to the ngff format in the acquire-python schema

@jni
Copy link
Author

jni commented May 22, 2024

@tlambert03 thanks for the links! I'll add these to the background section, but could you point me to where in the code

is able to accept acquisition definitions that explicitly adhere to the 5D model, but which will simply fail otherwise.

would fail? The smoking gun would be:

  • an example acquisition definition that cannot be handled by ome-ngff v0.4
  • the line(s) of code that would attempt to match the acquisition to the ome-ngff handler and fail

Maybe it's not as easy as that to define these things compactly, but if it is, I think it would be worthwhile detail for this RFC's motivation.

@joshmoore
Copy link
Member

joshmoore commented May 22, 2024

A few quick clarifications, @jni:

  • can we change @d-v-b's role to "spec updates"? In my mind, the "implementation" work will require (if necessary) changes to the implementations themselves whereas remove axis restrictions #235 takes care of "S1. ... AUTHORS ... update the spec"
  • Are you looking to clarify whether or not to remove the 3 space dimensions restriction in this RFC or is this something for a future RFC?
  • Perhaps let's change "Reviewers" to "proposed reviewers" and drop the "Your name here" before we merge, but 👍 for reaching out to interested parties at this stage.
  • Thinking about a future compatibility matrix (see Add initial compatibility matrix #59) would it make sense to define designations for the support provided by implementations?

@will-moore
Copy link
Member

Re: NGFF readers:

cc @manzt - https://github.com/hms-dbmi/vizarr - Any idea how much work it would be to support n-dimensional NGFF data?

cc @dgault - https://github.com/ome/ZarrReader/ - Since the OME data model is very much 5D, this is going to take a bit of thought on how to handle n-dimensional NGFF data?

@d-v-b
Copy link
Contributor

d-v-b commented May 22, 2024

  • Are you looking to clarify whether or not to remove the 3 space dimensions restriction in this RFC or is this something for a future RFC?

The space restrictions, and all other axis restrictions (other than the requirement that axes have unique names) are removed in #235

@normanrz
Copy link
Contributor

normanrz commented May 22, 2024

Re: NGFF readers:

Webknossos already supports an arbitrary number of dimensions. However, it assumes that there are only 3 space dimensions to map to xyz. I think the spec should provide guidance to visualization tools what to do with >3 space dimensions.

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/request-for-info-ngff-use-cases-that-are-currently-blocked/96641/1

Comment on lines +141 to +144
As part of the [proposed implementation][implementation], Davis Bennett has
created pydantic models that validate the proposed schema. These are actually
new additions to the NGFF specification, surfaced pre-existing errors in the
schema, and should prevent new errors from appearing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this text. Those pydantic models are merely a convenient way to write JSON schema. They don't express anything that's not already written in the prose of the spec. Also, I am planning on removing those models from the PR, because they add an undocumented build step that I don't have the energy to document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as of aa5c953 those models are gone

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-v-b I really loved the models! 😭

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they will live here https://github.com/janeliascicomp/pydantic-ome-ngff if 0.5 comes out

@will-moore
Copy link
Member

The PR at #235 mentioned above seems to go a bit further than this RFC in that it removes restrictions on ordering of dimensions, whereas this proposal only mentions removing the restriction on the number of dimensions.

I imagine that supporting arbitrary dimension order is a fair bit more work for implementers that n-dimensions, so that endorsement of this proposal may not signal endorsement of #235?

@d-v-b
Copy link
Contributor

d-v-b commented May 22, 2024

regarding advice for partial implementations (e.g., implementations that only support a fixed number of dimensions, or a fixed order), I included the following section in the PR: https://github.com/ome/ngff/pull/235/files#diff-ffe6148e5d9f47acc4337bb319ed4503a810214933e51f5f3e46a227b10e3fcdR565-R580, please let me know if this guidance is sufficient or if we should say more (and lets have that conversation over in #235 instead of here, so that we can keep synchronized with the actual changes to the spec)


## Overview

OME-NGFF version 0.4 places severe restrictions on the number, names, and types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OME-NGFF version 0.4 places severe restrictions on the number, names, and types
OME-NGFF version 0.4 restricts the number, names, and types

Comment on lines +24 to +29
of axes that are allowed in the axes metadata. This has had the effect of
limiting the datasets in proprietary formats that *can* be meaningfully
converted to NGFF. It has also prevented some novel datasets from being written
in NGFF format. This RFC only removes restrictions from the specification. An
important consequence is that all valid NGFF datasets would remain valid after
this change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this formatted as a suggestion but the formatting got in the way. I think it might help to put some text in there like this:

"These restrictions hinder converting some datasets in proprietary formats to NGFF. The axis restrictions have also prevented some novel datasets from being written in the NGFF format. Clearly the data model articulated in version 0.4 of OME-NGFF is not inclusive enough for these datasets. So this RFC proposes removing these axis restrictions, thereby making OME-NGFF more open to the diversity of bioimaging datasets. Because this RFC only removes restrictions from the specification, all valid NGFF datasets remain valid after this change."

basically, the 2-5D restriction is a barrier to researchers who have data that doesn't fit into that model. so we remove the restriction, and make the ome-ngff model more expressive, so that those researchers can use the format.

Comment on lines +58 to +59
These restrictions are actively preventing users from converting existing
datasets to NGFF. For example, Zeiss .czi datasets [may contain][czi format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These restrictions are actively preventing users from converting existing
datasets to NGFF. For example, Zeiss .czi datasets [may contain][czi format
These restrictions are a barrier preventing users from converting existing
datasets to NGFF. For example, Zeiss .czi datasets [may contain][czi format

@jni
Copy link
Author

jni commented May 22, 2024

@will-moore

The PR at #235 mentioned above seems to go a bit further than this RFC in that it removes restrictions on ordering of dimensions, whereas this proposal only mentions removing the restriction on the number of dimensions.

I probably need to update the summary at the top, but under "proposal" I write:

Proposal

This document proposes removing any restrictions on the number of dimensions
stored in NGFF arrays. Additionally, it removes restrictions on the names and
types of included dimensions.

If the names are arbitrary, the ordering must also be arbitrary, surely? But I can make it explicit.

important consequence is that all valid NGFF datasets would remain valid after
this change.

## Background
Copy link
Contributor

@d-v-b d-v-b May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to mention here that as a historical fact OME-NGFF is an OME project, and the OME data model is 5D. This I think goes a long way to explaining the inertia for the 5D limit and the other various axis restrictions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I need to really get into it. I'm happy to change "unfortunately, 0.4 imposes restrictions..." to "unfortunately, for historical reasons, 0.4 imposes restrictions...".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the word "unfortunate" fits here -- it's just an explanatory fact that this format was jump-started by OME, and they use a 5D model. This is literally the background for the 5D limit, so it might be helpful to put that here for people who don't know the history.

Comment on lines +111 to +117
A draft proposal for [coordinate transformations][trafo spec] already includes
most of the changes proposed here, so we envision that this RFC is compatible
with future plans for the format. The proposal does currently limit the number
of dimensions of type "space" to at most 3, but that limit [could be
removed][space dims comment]. If this RFC is approved, the transformation
specification would need to be updated to reflect this. However, that is an easy
change and there seems to be sufficient support in the community for this idea.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to talk about that (stalled) PR at all? I don't see why it's relevant here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's relevant because it speaks to the forward compatibility of this RFC — ie it is in line with existing proposals for the format. That the PR is stalled is not really relevant — it's stalled because of minor details (e.g. array order) that don't have a bearing on this PR. Based on the discussion, other aspects, and certainly the ones relevant to this RFC, have quite broad consensus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case isn't it sufficient to just state that there are no known conflicts with other active proposals?

@jni
Copy link
Author

jni commented May 22, 2024

@normanrz

Webknossos already supports an arbitrary number of dimensions. However, it assumes that there are only 3 space dimensions to map to xyz. I think the spec should provide guidance to visualization tools what to do with >3 space dimensions.

in my opinion the spec should leave this question undefined. The mapping can be direct (x=x, y=y, z=z), user defined (give users options of how to map axes), or arbitrary (x=foo, y=bar, z=baz). In practice, I think it is not going to be an issue, I am just wary of restricting the number for the same reason that I was wary of restricting the total number of dimensions, which indeed caused problems.

If it helps this RFC move forward, I can bring back the "maximum three spatial dimensions" limit from #138, and we can have the discussion in a later RFC. The unlike the other changes in this RFC, the removal of the maximum number of "space" dimensions is purely speculative on my end, and not motivated from a concrete use case.

Action requested:

  • if you would like me to limit the number of spatial dimensions to a maximum of 3, please 👍 this post.
  • if you would like the number of spatial dimensions to remain arbitrary (string theory ftw!), 👎 this post, and ideally please provide suggestions below for what guidance you would offer implementers.

@d-v-b
Copy link
Contributor

d-v-b commented May 22, 2024

  • if you would like the number of spatial dimensions to remain arbitrary (string theory ftw!), 👎 this post, and ideally please provide suggestions below for what guidance you would offer implementers.

Implementations that do not support some aspect of user data should clearly communicate that to users. Users can then decide which implementation use, given the data they have stored. We should not try to limit the data that users can store, simply because some implementations cannot represent that data.

This is a broader issue: as of 0.4, there are lots of OME-NGFF tools that don't support big images on cloud storage (of which I have plenty). Should we change the spec to limit the size, or location of images, just because some implementations can't load my big ones? I don't think so. So for the same reason, we should not restrict what axes users have, just because some implementations are opinionated about axes.

@normanrz
Copy link
Contributor

If it helps this RFC move forward, I can bring back the "maximum three spatial dimensions" limit from #138, and we can have the discussion in a later RFC. The unlike the other changes in this RFC, the removal of the maximum number of "space" dimensions is purely speculative on my end, and not motivated from a concrete use case.

I am in favor of limiting spatial dimensions to 3 because even on an abstract level I have a hard time imagining what a fourth spatial dimensions would be. Additional axes can of course be present in the image (e.g. for phases, dim reduction etc). That would also be convienent for Webknossos because the 3 spatial dimensions would be used for 3d visualization and all other dimensions would be "slider" dimensions (i.e. we'll have a slider to select the coordinate).

@d-v-b
Copy link
Contributor

d-v-b commented May 22, 2024

I am in favor of limiting spatial dimensions to 3 because even on an abstract level I have a hard time imagining what a fourth spatial dimensions would be. Additional axes can of course be present in the image (e.g. for phases, dim reduction etc). That would also be convienent for Webknossos because the 3 spatial dimensions would be used for 3d visualization and all other dimensions would be "slider" dimensions (i.e. we'll have a slider to select the coordinate).

I can imagine scenarios where 4+ spatial dimensions might arise, e.g. imaging that is parametrized over the positions of two independent 2D raster scanners would produce images with 4 spatial axes, if you add a z-actuator then you are at 5. And anyone who wants to use OME-NGFF for visualizing the phase space of a dynamical system might have 4+ spatial dimensions. Broadly speaking, I think it's a mistake to prematurely limit what users can represent, especially because these restrictions are a burden on implementers who must write the validation logic.

@jni
Copy link
Author

jni commented May 22, 2024

I am in favor of limiting spatial dimensions to 3 because even on an abstract level I have a hard time imagining what a fourth spatial dimensions would be.

It could be a pure mathematical construct. Like I said in #138, maybe you want to represent a 4D Klein bottle for didactic purposes. Or maybe a string theorist wants to save some particle simulations in full 12-dimensional space. 🤷

I'm inclined to agree with @d-v-b's arguments:

  • saving data is not only done to visualise it. So just because it's hard to visualise doesn't mean that we shouldn't allow saving it.
  • in practice, nearly all implementations are partial in one way or another. So, as long as implementations are aware of the limitations and can inform the user, it's ok to not fully cover all files that can theoretically in the spec. If a user tries to view a 12D string in webknossos, it's ok for webknossos to say, nope! I'm also ok with saying, (for example) "implementations SHOULD preferentially display the last 2 (for 2D viewers) or 3 (for 3D viewers) spatial dimensions as an image/volume and scroll through the remaining dimensions."

I'll wait for a few more votes — I think it's indeed fine to postpone for a later RFC, and I do think getting the remaining parts of this proposal in is urgent for many applications.

@tlambert03
Copy link

yeah, i think i agree with @d-v-b and @jni here. As an end-user, if I feel the need to express more than 3 spatial dimensions, I would be absolutely ok with viewer X (Webknossos, etc...) just saying "we only support 3 spatial dimensions, if you have more than three, we will pick three using the following heuristic: ...".

@normanrz
Copy link
Contributor

"we only support 3 spatial dimensions, if you have more than three, we will pick three using the following heuristic: ...".

We'd probably just pick the first three. That would be fine. Even better, if that is a recommendation in the spec.
Anyways, would be great to hear feedback from others in the tool developer community.

@crisluengo
Copy link

I'm in favour of removing as many restrictions as possible. Fewer restrictions generally leads to simpler standards. Why limit how many dimensions can be labelled "spatial"? It's not necessary, and so it shouldn't be done.

Software may want to limit how many dimensions they can handle, makes code simpler. But we have plenty of toolboxes/libraries that can handle arrays with an arbitrary number of dimensions. So, the standard shouldn't limit the number. Instead, let the software that wants to have a limit on the number of dimensions simply refuse to read files with too many dimensions. Don't standardize on the minimum common denominator.

@ziw-liu
Copy link

ziw-liu commented May 25, 2024

I have a similar question as @will-moore as what is the exact change to the specification with this RFC. Is it exactly the same as #235? If so, I think the RFC should be more explicit about its implications.
In addition to the changes to the number and ordering, #235 also removes all the restrictions on the type of dimensions. Does this mean that space, time, and channel types can now assume arbitrary names? If this is the case, then I'm confused about the discussion around the number of spatial dimensions, since the spatial dimensions can no longer be detected with a fixed type to begin with. And I think it is a regression in the expressiveness of the metadata, since sharing dimension types is no longer standardized and reliable.

@d-v-b
Copy link
Contributor

d-v-b commented May 25, 2024

Does this mean that space, time, and channel types can now assume arbitrary names? If this is the case, then I'm confused about the discussion around the number of spatial dimensions, since the spatial dimensions can no longer be detected with a fixed type to begin with. And I think it is a regression in the expressiveness of the metadata, since sharing dimension types is no longer standardized and reliable.

@ziw-liu I'm not the author of this RFC, but as the author of #235 I can say that in that PR there are no restrictions on the type field for elements of axes.

So, if applications previously relied on the "type": "space" mapping to find spatial axes, now it cannot, because there is no requirement that there be axes with "type": "space". For example, someone might save their data like this:

axes: [
 {"name": "foo", "type": "spatial", "unit": "meter"}, 
 {"name": "bar", "type": "space-like", "unit": "meter"},
 {"name": "baz", "type": "duration", "unit": "second"},
]

The above example has two axes that spatial, but they use a different type attribute. Without standardizing the type field somewhere, consuming applications cannot use the type attribute to detect which axes are spatial.

Does this summarize your concern?

Because if so, I agree with this concern but I think the actual problem is the type field itself, which is simply not well defined or motivated in the spec. As I mentioned in #215, the unit field alone seems to convey what "type" an axis has, which renders the type field of questionable utility.

I would welcome discussion in #215 on what the axis type field is actually for, because it would improve the spec to describe why it exists, how it is different from the unit field, and how applications should use this information.

I am happy to amend #235 to add recommendations for the type field, but I worry that if we wrote those recommendations today we would produce text like "if unit is a unit of length like "meter", then the type field should be "space"", which doesn't address an actual purpose for the type field.

@jni
Copy link
Author

jni commented May 27, 2024

Is it exactly the same as #235?

That is the intent.

Does this mean that space, time, and channel types can now assume arbitrary names?

I agree with this concern but I think the actual problem is the type field itself,

I disagree with @d-v-b and have commented on #215, but to record the objection here and keep a semi-complete record of discussion on the RFC PR itself: a unit does not unambiguously specify a type, for example "wavelength" and "space" are both measured in meters, to say nothing of "stage position", for example.

Therefore, I suggest that we use SHOULD as guidance for the special cases of "space", "time", and "channel". But I don't want to use MUST here because, as mentioned in the discussion above, I think it's ok for software to not support all ome-ngff files. For example, I think ome-ngff should be usable to store Fourier-transformed data (type "spatial-frequency", and data type complex64/complex128), but many viewers won't be able to work with that immediately or ever, and I think that's ok.

@ziw-liu
Copy link

ziw-liu commented May 27, 2024

Thanks @d-v-b and @jni. For me something like this would be useful:

The "type" field MAY be any string.
...
Implementations SHOULD use values "space", "time", "channel" for the "type" field when applicable.

I would also appreciate a similar recommendation for visualization implementations about choosing 'special dimensions' (e.g. first or last 1 time and 3 space axes) from multiple dimensions with the same type.

@jkh1
Copy link

jkh1 commented Jun 3, 2024

Fewer restrictions generally leads to simpler standards.

but they shift the burden to implementations which then reduces interoperability because loose specs have ambiguities. I am not against removing restrictions but then there should be strict guidance on what implementations must do so probably not use SHOULD but MUST.

@d-v-b
Copy link
Contributor

d-v-b commented Jun 3, 2024

Personally I would benefit from some concrete examples of implementations that rely on the current axis restrictions, so that we can better appreciate what impact these proposed changes would have, and potentially how to mitigate those impacts.

@jkh1
Copy link

jkh1 commented Jun 4, 2024

@d-v-b My own implementation relies on the order t, c, z, y, x. This has the advantage of having clear semantics and this is not just useful for visualization but also for computing. What I worry about is that by making the specs too flexible, this ends up with different implementations eventually leading to variant formats just like happened with TIFF. I'd prefer specs that cover 90% of the use cases in a clear and unambiguous way. But maybe that's an issue of scope, i.e. what is NGFF supposed to cover? I haven't encountered microscopy images that don't fit the current dimension pattern so maybe having concrete examples (e.g. from papers) of microscopy images with more than 5 axes would be useful to understand the need and help reason about it.

@normanrz
Copy link
Contributor

normanrz commented Jun 4, 2024

Our implementation of Webknossos expects that there are 2-3 space axes. We don't rely on ordering and all other axes can be arbitrary.
While I can see that there is a desire to have the axes more flexible, @jkh1's fragmentation arguments also resonate with me. To cover the 90% case, we could have a clear recommendation for how to structure the axes if you have tczyx (or a subset). All other configurations would also be allowed but not favored.

@will-moore
Copy link
Member

@d-v-b Here's a list of implementations I'm aware of that rely on current axis restrictions. In general these tools handle 2D data (or a stack of 2D planes) so they expect the last 2 dimensions to be (YX) so that the data doesn't have to be transformed for 2D viewing

  • https://github.com/ome/ome-zarr-py and https://github.com/ome/napari-ome-zarr
    When reading data (and possibly passing it to napari) it probably won't be too hard to handle reduced axes restrictions since the axes metadata can be passed to napari to handle. The more hard-coded axis behaviour (e.g. downsampling in 2D) is really used when writing data, so this won't be broken by fewer restrictions.

  • https://github.com/hms-dbmi/vizarr
    Not so familiar with all of vizarr: I don't see any limit on number of dimensions, but it expects spatial dims last and sliders or channel for the others.

  • https://github.com/ome/ZarrReader/
    Don't know this code, but this is a Bio-Formats reader so it wants to output OME model data (e.g. for OMERO). Since OMERO only handles up to 5D data, I don't know how this will deal with extra dimensions.

  • https://github.com/ome/ome-ngff-validator
    As above, it does some very simple loading of 2D chunks for display, using the last 2 dimensions.

@tlambert03
Copy link

tlambert03 commented Jun 4, 2024

i disagree that allowing flexible axes will cause fragmentation. I think it's the opposite actually. @jkh1, if your use case relies on a strict TCZYX model, then your primary concern should be whether, given a dataset, you can unambiguously find those 5 axes in the dataset (and then, you can transpose them as needed to fit your required dimension ordering). not whether the specification technically allows for someone else to do something you're not interested in doing. (It's the restriction of those other use cases that causes fragmentation)

I absolutely agree (and I think we all do?) that inasmuch as a dataset does have a standard 3-space + 1-time + 1-channel dimensional model, then the spec should make it unambiguous how to find that. I think it already does that.

@jkh1
Copy link

jkh1 commented Jun 4, 2024

I agree that use cases not covered will use something else hence also lead to fragmentation but I think preserving interoperability for the greater number of cases should have priority. I am not against changing the model, my primary concern is about preserving the unambiguous semantics of the axes when this is done and maybe the issue is that it isn't clear (at least to me) that this will be the case. The new specs should include the current one as valid subset and also be semantically unambiguous. This probably means standardizing the vocabulary and defining what implementations should do with what they don't understand.

@d-v-b
Copy link
Contributor

d-v-b commented Jun 4, 2024

But maybe that's an issue of scope, i.e. what is NGFF supposed to cover? I haven't encountered microscopy images that don't fit the current dimension pattern so maybe having concrete examples (e.g. from papers) of microscopy images with more than 5 axes would be useful to understand the need and help reason about it.

Here is a quick rundown of imaging modalities that I think would have trouble fitting into the [t,c,z,y,x] model:

  • Imaging where the instrument captures a diffraction pattern rather than a focused image

    • e.g. textbook xray diffraction or ptychographic techniques, or MRI. I don't know what axis type reciprocal space should have (e.g., because the axis.type field doesn't really have a solid foundation), but I think we can agree that the lateral extent of a diffraction pattern is a different kind of thing than the lateral extent of a focused image.
  • polarization imaging

    • It seems fair to represent the polarization angle of incident light as a dimension of a dataset, if the polarization angle was varied in steps during imaging
    • It may also be desirable to represent the polarization angle of detected light alongside the intensity of that light, and an additional array axis would be a conventional way of doing this. Maybe you think this is what the "channel" axis type is for, but suppose someone does a conventional multichannel acquisition AND polarization imaging, e.g. measuring polarization responses of different wavelengths. In this case, the polarization channel must be separated from the wavelength.
  • Phase contrast imaging

    • Like polarization imaging, phase contrast imaging can record a conventional "intensity" measure alongside an optical delay measurement (the phase). Same argument as above re: using the channel axis for phase.
  • Imaging during X, where X is some other measurement or parameter that is stepped during imaging

    • X could be pressure, or temperature, or some other experimental variable that the experimenter is co-varying with image acquisition. E.g., If someone acquired a stack of data where they imaged the same sample at [T=100, T=101, T=102... T=200K], then it seems reasonable for them to save their data with T as an axis, similarly pressure, or anything else.
    • X can be a mass spectrometer, in the case of MALDI, which gives spatially-resolved mass spectra alongside conventional microscopy images. I don't think it would be odd to set aside an array axis for the spectrum.
  • Simulations

    • If someone wants to simulate 4D space and save the results in OME-NGFF, why should we stop them?

The new specs should include the current one as valid subset and also be semantically unambiguous. This probably means standardizing the vocabulary and defining what implementations should do with what they don't understand.

The new spec does include the current one as a valid subset, and I would argue that the current spec is actually semantically very ambiguous, because it doesn't define what the different types of axes mean. I think contributions to improve this would be welcome.

@d-v-b
Copy link
Contributor

d-v-b commented Jun 4, 2024

another important example, on the lower end of the dimensionality spectrum: the output of a line-scanning camera is a 1 dimensional array. As OME-NGFF 0.4 cannot represent 1D data, the format cannot represent a single frame of a line-scanning camera image. NGFF users with such data would have to pad their data with fake dimensions, which is data mangling and a very bad user experience.

@jni
Copy link
Author

jni commented Jun 4, 2024

Another example I've been working with lately: electron backscatter diffraction (EBSD), which stores a 2D diffraction pattern for each pixel of a material. And the summarised data is still xy+(3D angle) or xy+(quaternion). (The latter could stored as "channel" but that's a little bit of an abuse of the spec, imho.)

My reading of the discussion above is "loosen restrictions, but offer guidance with SHOULD as to how tczyx SHOULD be stored".

imho, though, we should drop the order requirement — it is not hard to transpose an array, and there are good reasons (e.g. performance) why e.g. during acquisition, one might want to store bytes in TZCYX order. Of course, that could be done in some "transitional" format, but I think it would be super nice for everyone if that was also a valid OME-NGFF file!

@tlambert03
Copy link

Strongly agree that order shouldn't matter. It's trivial to create transposed views of when reading

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/python-bioformats-not-able-to-correctly-open-an-image/96600/14

@sebi06
Copy link

sebi06 commented Jun 5, 2024

Hi all, very important discussion. For us (from a microscope vendor perspective) the old TCZYX 5D approach is one of the biggest limitation already and obviously we hope that OME-NGFF will allow for arbitrary dimensions, that can be identifies and read easily.

If this will not be the case, there will be than again various workarounds etc. will will in my opinion defeat the idea behind OME-NGFF. For us this would again mean, the OME-NGFF is just another format, where we need to figure out how to convert our CZI format to. But if it supports many dimensions etc., this becomes a valid option to be used even for vendors.

@jni
Copy link
Author

jni commented Jun 5, 2024

Thanks for the input @sebi06! Would you be happy to be listed as an endorser of the RFC? (If so, please 👍 the original post at the top.) I use libczi as an example in the RFC, but if it were endorsed by you directly that example would probably hold more weight. 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet