-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RasterDataset: add control over resampling algorithm #2015
RasterDataset: add control over resampling algorithm #2015
Conversation
.. versionadded:: 0.6 | ||
""" | ||
# Based on torch.is_floating_point | ||
if self.dtype in [torch.float64, torch.float32, torch.float16, torch.bfloat16]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch.double == torch.float64
, and the tests are designed to confirm that both work.
Doesn't seem to have as big of an impact on I/O performance as I expected:
|
This approach doesn't work well for datasets like L7 Irish or L8 Biome where a single |
I think cubic interpolation is not such a good idea because the corresponding convolution kernel has negative weights, meaning that the output data range is not guaranteed. This can really mess up any normalization or scaling applied to the data before the resampling step. Here’s a minimal example of what I’m talking about: import numpy as np
import rasterio
dst_data = data = np.random.rand(512, 512)
dst: rasterio.io.DatasetWriter
with rasterio.open(
"temp.tif", mode="w", count=1, dtype=np.float32, width=512, height=512
) as dst:
dst.write(dst_data, indexes=1)
src: rasterio.io.DatasetReader
with rasterio.open("temp.tif") as src:
src_data = src.read(
out_shape=(src.count, int(src.height * 2), int(src.width * 2)),
resampling=rasterio.enums.Resampling.cubic,
)
assert (
src_data.min() >= 0 and src_data.max() < 1
), "The cubic kernel has negative weights!" This can also cause issues with no-data values; there may be huge inaccuracies in the final product depending on where they land in relation to the kernel. I think bilinear interpolation is fine for 99.9% of cases. |
You can play around with other enums and see which looks best: https://rasterio.readthedocs.io/en/stable/api/rasterio.enums.html#rasterio.enums.Resampling I agree with @DimitrisMantas that we should pick a safe/simple/fast default. |
I think this is more or less the one-liner for each method:
|
By the way, Lanczos is technically "the best" of the bunch... |
Agree with ^. It should be NN resampling for masks and Bilinear for all else by default. If a user wants something specific they can override. |
264313f
to
1a07164
Compare
1a07164
to
48fb147
Compare
By default,
rasterio.merge.merge
uses "nearest" as its resampling algorithm. This is fast and works well for masks where you don't want to interpolate two categorical classes, but results in resampling artifacts for floating point images. This PR adds an attribute to control the resampling algorithm used. It defaults to "bilinear" for float (usually images) and "nearest" for int (usually masks).I chose to base this on
dtype
instead ofis_image
because there are times when you want to interpolate floating point pixelwise regression masks.@robmarkcole can you upload before and after plots from the example you showed me?
Closes #2012