Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow setting tiff scale + offset + custom metadata #317

Open
jdries opened this issue Aug 23, 2024 · 12 comments · May be fixed by Open-EO/openeo-geopyspark-driver#885
Open

allow setting tiff scale + offset + custom metadata #317

jdries opened this issue Aug 23, 2024 · 12 comments · May be fixed by Open-EO/openeo-geopyspark-driver#885
Assignees

Comments

@jdries
Copy link
Contributor

jdries commented Aug 23, 2024

Add a format option to set tiff metadata. Some of these are regular tiff tags, others will have to be encoded as gdal band metadata.

The options in geotrellis are limited, but there are very basic tifftools available in linux:
https://linux.die.net/man/1/tiffset
The idea is to be able to set metadata tags without touching the rest of the file, avoiding a full rewrite. Ideally this is done right after writing the tiff in geotrellis.

Format option will have to be passed through via save_result.

@VictorVerhaert
Copy link

example of custom metadata: https://github.com/VITO-RS-Vegetation/lcfm-production/issues/18

@jdries
Copy link
Contributor Author

jdries commented Aug 23, 2024

For tiffset, we would need to add libtiff-tools to the container.

@JorisCod
Copy link

JorisCod commented Sep 5, 2024

The current metadata, visible through gdalinfo, is:

Metadata:
PROCESSING_SOFTWARE=0.39.0a1
AREA_OR_POINT=Area

both can stay, but what we are looking for is:

Metadata:
AREA_OR_POINT=Area
bands=['s2-B02-p10', 's2-B02-p25']
copyright=LCFM project 2020 / Contains modified Copernicus Sentinel data (2020) processed by LCFM consortium
creation_time=2024-07-19 00:35:04.693848
license=CC-BY 4.0 - https://creativecommons.org/licenses/by/4.0/
product_crs=EPSG:32629
product_grid=Sentinel-2 UTM tiling grid
product_tile=29TNE
product_type=LSF monthly median composite for band B02
reference=TODO
time_end=2020-02-29T23:59:59Z
time_start=2020-02-01T00:00:00Z
title=LCFM Monthly Land Surface Features (LSF-MONTHLY) product at 10m resolution for year 2020
version=v002-satio

So this would be something quite flexible. We are setting the metadata through rasterio (python) and passing a dictionary:

        if metadata:
            with rasterio.open(final_vrt_fn, 'r+') as dst:
                dst.update_tags(**metadata)

        if bands_names:
            with rasterio.open(final_vrt_fn, 'r+') as dst:
                for i, b in enumerate(bands_names):
                    dst.set_band_description(i + 1, b)

Additionally, there is a difference in how we set the band description and how OpenEO does it:
OpenEO:
Band 2 Block=64x64 Type=Int16, ColorInterp=Undefined
NoData Value=-32768
Overviews: 1033x1542, 517x771, 259x386, 130x193, 65x97, 33x49
Metadata:
DESCRIPTION=B02_P25

We:
Band 17 Block=1024x1024 Type=UInt16, ColorInterp=Undefined
Description = s2-B08-p25
NoData Value=65535
Offset: 0.0031999999191612, Scale:8.16687767161827e-06

The description in OpenEO is at metadata level, which makes that the band names are not displayed in the symbology in Qgis.

Relatedly, as you see in the output, there's an offset and scale on every band. In principle, the offset and scale can be different per band.

@JorisCod
Copy link

Just checking whether this issue is already planned?
If it's too extensive, can the scaling and offset be done first and the metadata in a separate issue?

@JorisCod
Copy link

We set the scale and offset through gdal_translate:
https://gdal.org/en/latest/programs/gdal_translate.html#cmdoption-gdal_translate-a_scale
but there are likely other options.

@bossie
Copy link
Collaborator

bossie commented Sep 24, 2024

Regardless of how we put them in the GeoTiff, will the user also provide values for scale/offset in the format options?

@bossie
Copy link
Collaborator

bossie commented Sep 24, 2024

I could get this to work:

tiffset -s 42112 '<GDALMetadata>
  <Item name="PROCESSING_SOFTWARE">0.40.1a1</Item>
  <Item name="DESCRIPTION" sample="0">red</Item>
  <Item name="SCALE" sample="0" role="scale">1.23</Item>
  <Item name="OFFSET" sample="0" role="offset">4.56</Item>
</GDALMetadata>' test_load_stac_datacube_parameters.tif

where:

  • 42112 is the code for the GDAL metadata tag;
  • sample points to the band number;
  • role is necessary to get GDAL to treat SCALE/OFFSET as actual scale/offset (not supported by Geotrellis).

bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Sep 26, 2024
bossie added a commit to Open-EO/openeo-geotrellis-kubernetes that referenced this issue Sep 26, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Sep 26, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Sep 26, 2024
The spec demands that TIFF tiles have a tile size that is a multiple of 16. We write GeoTIFFs
with a tile size that is equal to the TileLayerRDD's tile size so a tile size of 4 will produce GeoTiffs
that are not compliant. Some tools cope, like gdalinfo (with a warning), but some will simply fail,
like tiffset.

Open-EO/openeo-geotrellis-extensions#317
bossie added a commit to Open-EO/openeo-geotrellis-kubernetes that referenced this issue Sep 26, 2024
bossie added a commit to Open-EO/openeo-geotrellis-kubernetes that referenced this issue Sep 26, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Sep 26, 2024
@bossie
Copy link
Collaborator

bossie commented Sep 30, 2024

Runs locally but tests in Jenkins fail with a segmentation fault:

subprocess.CalledProcessError: Command '['tiffset', '-s', '42112', '0.40.1a1TileRowTileCol', '/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-885/pytest-tmp/pytest-of-jenkins/pytest-0/test_separate_asset_per_band_l0/openEO_2021-06-05Z_TileRow.tif']' died with <Signals.SIGSEGV: 11>.

@bossie
Copy link
Collaborator

bossie commented Oct 2, 2024

Possible method to get core dumps within the container: https://stackoverflow.com/a/72048923.

@jdries
Copy link
Contributor Author

jdries commented Oct 10, 2024

The segfault can be reproduced in kubernetes, but it does seem to depend on the exact tag and value, as I was able to set for instance the description tag.

dmesg entry for segfaults is not so helpfull:

[782319.609931] tiffset[2919160]: segfault at 6d ip 00007f1e26e96895 sp 00007fff6675e9b8 error 4 in libc-2.28.so[7f1e26d36000+1bc000]
[782319.612686] Code: c7 80 00 00 00 48 81 fa 80 00 00 00 77 bc c5 fe 7f 29 c5 fe 7f 71 e0 c5 fe 7f 79 c0 c5 7e 7f 41 a0 c4 c1 7e 7f 23 c5 f8 77 c3 <c5> fe 6f 26 c5 fe 6f 6e 20 c5 fe 6f 76 40 c5 fe 6f 7e 60 c5 7e 6f
[782340.241434] tiffset[2919401]: segfault at 6d ip 00007f95ff2b1895 sp 00007fff7ca95a68 error 4 in libc-2.28.so[7f95ff151000+1bc000]
[782340.242934] Code: c7 80 00 00 00 48 81 fa 80 00 00 00 77 bc c5 fe 7f 29 c5 fe 7f 71 e0 c5 fe 7f 79 c0 c5 7e 7f 41 a0 c4 c1 7e 7f 23 c5 f8 77 c3 <c5> fe 6f 26 c5 fe 6f 6e 20 c5 fe 6f 76 40 c5 fe 6f 7e 60 c5 7e 6f

@jdries
Copy link
Contributor Author

jdries commented Oct 10, 2024

Found a potential solution: you can also set tag values via a file, and that seems to avoid the segfault.
This worked for me:
tiffset -sf 42112 tag.txt ESA_WorldCover_10m_2021_v200_S55W071_S2RGBNIR_v2.tif

@bossie
Copy link
Collaborator

bossie commented Oct 10, 2024

Oh dear.

That does seem to work. 👍

bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 10, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 10, 2024
bossie added a commit to Open-EO/openeo-geopyspark-driver that referenced this issue Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants