RasterDataset: add control over resampling algorithm #2015

adamjstewart · 2024-04-20T09:03:57Z

By default, rasterio.merge.merge uses "nearest" as its resampling algorithm. This is fast and works well for masks where you don't want to interpolate two categorical classes, but results in resampling artifacts for floating point images. This PR adds an attribute to control the resampling algorithm used. It defaults to "bilinear" for float (usually images) and "nearest" for int (usually masks).

I chose to base this on dtype instead of is_image because there are times when you want to interpolate floating point pixelwise regression masks.

@robmarkcole can you upload before and after plots from the example you showed me?

Closes #2012

adamjstewart · 2024-04-20T09:05:00Z

torchgeo/datasets/geo.py

+        .. versionadded:: 0.6
+        """
+        # Based on torch.is_floating_point
+        if self.dtype in [torch.float64, torch.float32, torch.float16, torch.bfloat16]:


torch.double == torch.float64, and the tests are designed to confirm that both work.

adamjstewart · 2024-04-20T09:26:03Z

Doesn't seem to have as big of an impact on I/O performance as I expected:

	raw (random)	raw (grid)	preprocessed (random)	preprocessed (grid)
before	17.350	10.984	9.9158	4.6496
after	18.706	12.371	9.5933	4.5972

adamjstewart · 2024-04-20T09:30:48Z

This approach doesn't work well for datasets like L7 Irish or L8 Biome where a single RasterDataset is used for both image and mask. I think the solution is to instead subclass IntersectionDataset like I did in #1972.

DimitrisMantas · 2024-04-22T08:21:43Z

I think cubic interpolation is not such a good idea because the corresponding convolution kernel has negative weights, meaning that the output data range is not guaranteed.

This can really mess up any normalization or scaling applied to the data before the resampling step.

Here’s a minimal example of what I’m talking about:

import numpy as np
import rasterio

dst_data = data = np.random.rand(512, 512)

dst: rasterio.io.DatasetWriter
with rasterio.open(
    "temp.tif", mode="w", count=1, dtype=np.float32, width=512, height=512
) as dst:
    dst.write(dst_data, indexes=1)

src: rasterio.io.DatasetReader
with rasterio.open("temp.tif") as src:
    src_data = src.read(
        out_shape=(src.count, int(src.height * 2), int(src.width * 2)),
        resampling=rasterio.enums.Resampling.cubic,
    )
    assert (
        src_data.min() >= 0 and src_data.max() < 1
    ), "The cubic kernel has negative weights!"

This can also cause issues with no-data values; there may be huge inaccuracies in the final product depending on where they land in relation to the kernel.

I think bilinear interpolation is fine for 99.9% of cases.

robmarkcole · 2024-04-22T09:03:54Z

This pair shows the original issue, addressed in this PR

This was created with a random sampler, checking for it, but in general the results are improved. There are however some kind of streak artefact:

adamjstewart · 2024-04-22T09:34:19Z

You can play around with other enums and see which looks best: https://rasterio.readthedocs.io/en/stable/api/rasterio.enums.html#rasterio.enums.Resampling

I agree with @DimitrisMantas that we should pick a safe/simple/fast default.

DimitrisMantas · 2024-04-22T11:51:10Z

I think this is more or less the one-liner for each method:

Nearest/Mode: Good for masks; not suitable for continuous fields (i.e., images)
Bilinear: Suitable for continuous data; works best with smooth-ish fields (e.g., DEMs & DSMs)
Cubic/Cubic Spline/Lanczos: These are all the same as far as we are concerned; they are a bit unpredictable.
Average: It sounds sounds a bit like bilinear, but it's probably a kernel.
Gauss: Produces a probably undesirable smoothing effect when upsampling images.
Min/Max/etc.: They are probably useful in certain regression tasks, but I can't think of a potential application at the moment.

DimitrisMantas · 2024-04-22T11:52:38Z

By the way, Lanczos is technically "the best" of the bunch...

isaaccorley · 2024-04-22T14:24:42Z

Agree with ^. It should be NN resampling for masks and Bilinear for all else by default. If a user wants something specific they can override.

adamjstewart added this to the 0.6.0 milestone Apr 20, 2024

github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets testing Continuous integration testing labels Apr 20, 2024

adamjstewart commented Apr 20, 2024

View reviewed changes

adamjstewart mentioned this pull request Apr 27, 2024

L7 Irish: convert to IntersectionDataset #2034

Merged

adamjstewart force-pushed the datasets/raster-resampling branch from 264313f to 1a07164 Compare May 3, 2024 13:52

adamjstewart mentioned this pull request May 13, 2024

L8 Biome: convert to IntersectionDataset #2058

Merged

adamjstewart added 3 commits May 13, 2024 16:23

RasterDataset: add control over resampling algorithm

2cfc090

Fix type hints

7d93582

cubic -> bilinear

48fb147

adamjstewart force-pushed the datasets/raster-resampling branch from 1a07164 to 48fb147 Compare May 13, 2024 14:23

Ruff: single quotes

2d4a060

adamjstewart merged commit 25fb9cc into microsoft:main May 13, 2024
17 checks passed

adamjstewart deleted the datasets/raster-resampling branch May 13, 2024 15:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RasterDataset: add control over resampling algorithm #2015

RasterDataset: add control over resampling algorithm #2015

adamjstewart commented Apr 20, 2024 •

edited

Loading

adamjstewart Apr 20, 2024

adamjstewart commented Apr 20, 2024

adamjstewart commented Apr 20, 2024

DimitrisMantas commented Apr 22, 2024

robmarkcole commented Apr 22, 2024 •

edited

Loading

adamjstewart commented Apr 22, 2024

DimitrisMantas commented Apr 22, 2024

DimitrisMantas commented Apr 22, 2024 •

edited

Loading

isaaccorley commented Apr 22, 2024

RasterDataset: add control over resampling algorithm #2015

RasterDataset: add control over resampling algorithm #2015

Conversation

adamjstewart commented Apr 20, 2024 • edited Loading

adamjstewart Apr 20, 2024

Choose a reason for hiding this comment

adamjstewart commented Apr 20, 2024

adamjstewart commented Apr 20, 2024

DimitrisMantas commented Apr 22, 2024

robmarkcole commented Apr 22, 2024 • edited Loading

adamjstewart commented Apr 22, 2024

DimitrisMantas commented Apr 22, 2024

DimitrisMantas commented Apr 22, 2024 • edited Loading

isaaccorley commented Apr 22, 2024

adamjstewart commented Apr 20, 2024 •

edited

Loading

robmarkcole commented Apr 22, 2024 •

edited

Loading

DimitrisMantas commented Apr 22, 2024 •

edited

Loading