Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_stac: scale_and_offset #503

Open
clausmichele opened this issue May 8, 2024 · 7 comments
Open

load_stac: scale_and_offset #503

clausmichele opened this issue May 8, 2024 · 7 comments

Comments

@clausmichele
Copy link
Member

clausmichele commented May 8, 2024

Proposed Process ID: load_stac
Proposed Parameter Name: scale_and_offset
Optional: yes, default: False

Context

Recently, after the introduction of the new Sentinel-2 processing baseline, an offset has been introduced (additionally to the scale, which was already present). Previously, since the conversion from digital number (DN) (the actual values in the S2 files) to reflectances was performed only by applying the scale factor, for many applications it was the same as using DN (in many indexes, the scale factor is being neglected).
Now, both have to be applied in order to obtain meaningful results.

Some GitHub issues discussing about this topic:
Element84/earth-search#9
Element84/earth-search#23
opendatacube/odc-stac#55

@jdries @dthiex How do you manage this for the SENTINEL2_L2A collections?

Description

if scale_and_offset is True, apply them automatically. They should be available in the raster:bands extension metadata.

Data Type

boolean

Additional changes

@dthiex
Copy link
Contributor

dthiex commented May 14, 2024

In SH itself we follow this approach (we have a harmonization parameter which if set to true will already compensate the offset).

In openEO SH I believe we don't support setting the parameter so we apply the default meaning we request harmonize DN so DN = 10000*Reflectance is still true.

In my feeling this should though not be part of the load_stac process as it's very specific to the case "I load L2A raw DNs but I want to get Reflectance values".

@jdries
Copy link
Contributor

jdries commented May 14, 2024

we currently also configure the behaviour per collection, but most of them require the user to explicitly do it, which is annoying.
For Sentinel-2 we make sure to convert to the 'standard' scaling factor of 0.0001, to avoid issues with the new processing baseline.
It would be nice to have a generic solution for load_collection as well.

@soxofaan
Copy link
Member

yeah, it would be nice if this could be addressed the same way in both load_collection and load_stac.

Side note: I wonder if there isn't a more generic or future-proof parameter name than scale_and_offset to not be limited to just scale and offset transforms. For example in the SH link of Daniel I see clamping of negative values.

@m-mohr
Copy link
Member

m-mohr commented May 16, 2024

For load_collection the original idea was to have these information in metadata and then apply it automatically during data loading. Is this done? I think I'd assume the same in load_stac by default, if the metadata is given.

@clausmichele
Copy link
Member Author

For load_collection the original idea was to have these information in metadata and then apply it automatically during data loading. Is this done? I think I'd assume the same in load_stac by default, if the metadata is given.

Currently it is not specifically mentioned in the load_stac description. We would have to specify that if the raster extension is available and the scale and/or offset values as well, we apply it. However, I would prefer being able to switch it on/off depending on the use case, since applying it automatically several times changes the data type (like from uint8 to float32 or float64) and requires much more space and resources.

For the load_collection process, as @m-mohr mentions, even for me it is enough to document it in the metadata to keep it as simple and efficient as possible.

@soxofaan
Copy link
Member

For load_collection the original idea was to have these information in metadata and then apply it automatically during data loading. Is this done?

You mean "apply automatically" by client or backend?

If it's automatically to be done by the backend, what is the point of exposing this as collection metadata? Or worse: you even risk the user/client doing the normalization again because of the misunderstand about getting raw DN values or physical values.

In any case, in the VITO backend we don't automatically normalize/harmonize for memory/performance reasons (e.g. if the raw data is uint8, we want to have the option to keep that type when it's not necessary to convert to more memory-heavy floats/doubles). For example if you download SENTINEL2_L2A (B02, B03, B04) without processing, you get values roughly in the [0-10k] range, instead of reflectances in the [0-1] range. This is obviously a basic behavior we can not change suddenly. Backend-side auto-normalization should be an opt-in feature, e.g. with the proposed scale_and_offset parameter.

It could also be a client feature (opt-in again) to automatically add a apply node to do rescaling based on collection metadata.

@clausmichele
Copy link
Member Author

clausmichele commented May 17, 2024

The load_collection discussion is a bit off topic in my opinion, let's focus on load_stac!

@soxofaan doing it client side seems more a workaround to me, since it wouldn't be documented in the openEO processes docs and also not available in the same way in the other clients.

Anyway, if we can't agree to have an additional parameter, we should at least document what is the default behaviour of load_stac concerning these parameters (which could also be embedded in the geoTIFF metadata, not only in the STAC metadata).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants