-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Querying the expression parser with geometry #226
Comments
I think I'll have to see an example input and output here. It's not clear to me whether this is on a tile level, or whether you want to query raw GTiff data at the pixel level, and how you would expect this to work with different zoom levels. |
For my use case it would always be at or near native resolution of the GTiff file. Maybe this wouldn't be solved by terracotta directly... I'm not sure what is possible... and maybe native resolution doesn't make sense if there are multiple resolutions for a given compute query. tiling might be interesting in this case--and it is a more general case. I'm not against working in the tile level per se I'd rather not duplicate for single-band non-compute extraction I do this: import pandas as pd
import pyproj
import rasterio
import shapely.wkt
from io import StringIO
from flask import make_response
from rasterio.mask import raster_geometry_mask
from shapely.geometry import Point
from shapely.ops import transform as shapely_transform
from terracotta import get_driver, get_settings
from terracotta.server.flask_api import convert_exceptions
from utils import key_to_filename
prj4326_3857 = pyproj.Transformer.from_crs("epsg:4326", "epsg:3857", always_xy=True)
prj3857_4326 = pyproj.Transformer.from_crs("epsg:3857", "epsg:4326", always_xy=True)
# http://localhost:5000/extract/ds1/POLYGON((19.19036 -33.674068,19.75616 -33.674068,19.75616 -33.43029,19.19036 -33.43029,19.19036 -33.674068))
# http://localhost:5000/extract/ds1/POINT(18.3863 -33.92021)
def register(server):
@server.route("/extract/<path:keys>/<string:wkt>", methods=["GET"])
@convert_exceptions
def extract(keys: str, wkt: str):
keys = keys.split("/")
settings = get_settings()
driver = get_driver(settings.DRIVER_PATH, provider=settings.DRIVER_PROVIDER)
with driver.connect():
dataset = key_to_filename(driver, keys)
result = get_bbox_data(wkt, dataset)
return send_as_csv(result)
def send_as_csv(data):
str_io = StringIO()
data.to_csv(str_io, index=False)
output = make_response(str_io.getvalue())
output.headers["Content-Disposition"] = f"attachment; filename=data_download.csv"
output.headers["Content-type"] = "text/csv"
return output
def get_bbox_data(wkt, dataset):
geom = shapely_transform(prj4326_3857.transform, shapely.wkt.loads(wkt))
ds_key, ds_path = dataset
with rasterio.open(ds_path) as src:
return raster2xy(src, geom, ds_key[0])
def raster2xy(src, geom, ds_name):
_m, _affine, win = raster_geometry_mask(src, [geom], crop=True, all_touched=True)
arr_2D = src.read(1, window=win)
nodata = src.nodata
blockwidth = src.block_shapes[0][0]
blockheight = src.block_shapes[0][1]
px_list = list()
for pixel_index, pixels in enumerate(arr_2D):
print(pixel_index)
px = win.col_off + (pixel_index % blockwidth)
py = win.row_off + (pixel_index // blockheight)
easting, northing = src.xy(px, py)
for pixel_value in pixels:
if pixel_value == nodata:
continue
try:
px_list.append(
{
"wkt": shapely.wkt.dumps(
shapely_transform(
prj3857_4326.transform, Point(easting, northing)
),
rounding_precision=5,
),
ds_name: pixel_value,
}
)
except ValueError:
# skip nan values
continue
return pd.DataFrame(px_list) |
I see 2 main problems with this approach:
Now for compute expression support, I guess all it would take would be to read multiple datasets and pass them to This is bound to get ugly. Honestly I would probably try to use geotiff.js to read the COG data on the client side directly from your data storage. You can still use Terracotta for indexing and tiling, but when it comes to reading raw data I think we're straying a bit far from the intended use case. |
Doing more to the COGs registered in Terracotta than just PNG tile rendering is definitely relevant. There are several features that could be worth considering:
I think any functionality that: A) takes a well defined user input to work on the GeoTIFF B) and returns a condensed / reduced / small response derived from the GeoTIFF would make sense to add to Terracotta. Getting the values for all pixels within the polygon as CSV, like you are considering @chapmanjacobd, perhaps is a bit too far away from the second criteria I listed - that we should not transfer data from COG in storage, but reduce it client-side to something web-friendly. As for the valid concern you voiced, @dionhaefner, that large requests could stall the server - I think that could be prevented by validating the complexity of the polygon (-> limit the cost of the mask calculation) and then validating the number of affected pixels, by summing over the mask (-> limits the amount of data returned / processed). |
I'm not against adding a route mapping
With lots of validation on the passed bbox. In this proposal all raw pixel data would be aligned with the XYZ grid, so there would be no way to read the unprojected raw data as in the COG. But this also allows for efficient usage of overviews and makes it so we don't have to inform the user of which CRS the data is in, and ensures that tiles and raw data match. So the |
ohhhh very cool. I hadn't even thought about that It sounds like working at the tile level would completely solve my use case. Even if terracotta didn't do exactly what I wanted it would be a lot more accessible with the above routes as examples |
Nice list, @dionhaefner! Seeing this in clear API specs makes the cases quite tangible. Do we have to limit ourselves to bbox or could we take any polygon (with the URL length limit, of course)? Then a bbox would just be a simple case of that. Especially for zonal statistics, users (like me at least) will often want to be more specific than just a box and animations look nice when masked to the polygon and not just cropped to a box in WGS or Web Mercator coordinates. Another consideration for returning untiled PNGs: If a user wants to display that PNG on a map, they need to know where to place the image on the map. We have experimented with such an API - computing zonal statistics and returning cropped PNGs (as byte64-encoded strings in JSON) together with anchor points: https://soilmoisturepro.z6.web.core.windows.net/ Unless we make the image fit exactly the bbox asks for, we need to provide that information somehow. |
Yes I meant arbitrary geometry.
Yea I'm not sure if that might be taking it too far. Your soil moisture example would also work with many small rasters, right? If you want to put stuff on a map then tiling should be what you want. On the other hand, in my proposed API all output would be aligned with the XYZ grid, so if you really really need this you can always compute the nearest pixel to your bbox on that grid in the frontend. Or you could ignore the issue and assume that origin == upper left corner of your bbox (with some sub-pixel inaccuracy). |
Thinking more about this I think it will be too slow to compute geometry operations per tile... here is some half-baked code if someone wants to try further import contextlib
import functools
import warnings
from concurrent.futures import Future
from typing import Any, Dict, Mapping, Optional, Sequence, Tuple, TypeVar, Union
from typing.io import BinaryIO
import geopandas as gpd
import IPython
import mercantile
import numpy as np
import terracotta
from rich import inspect
from terracotta import exceptions, get_driver, get_settings, image, update_settings, xyz
from terracotta.drivers.base import Driver, requires_connection
from terracotta.drivers.raster_base import submit_to_executor
from terracotta.handlers.compute import Number
from refresh import DB_PATH
import pyproj
from shapely.ops import transform
prj4326_3857 = pyproj.Transformer.from_crs("EPSG:4326", "EPSG:3857")
v = gpd.read_file("geoBoundaries-IND-ADM2.topojson")
g4326 = v[v["shapeName"] == "Tenkasi"].geometry.values[0]
IPython.embed()
g = transform(prj4326_3857.transform, g4326)
def _get_raster_tile(
cls,
path: str,
*,
reprojection_method: str,
resampling_method: str,
tile_bounds: Tuple[float, float, float, float] = None,
tile_size: Tuple[int, int] = (256, 256),
preserve_values: bool = False,
) -> np.ma.MaskedArray:
"""Load a raster dataset from a file through rasterio.
Heavily inspired by mapbox/rio-tiler
"""
import rasterio
from affine import Affine
from rasterio import transform, warp, windows
from rasterio.vrt import WarpedVRT
dst_bounds: Tuple[float, float, float, float]
if preserve_values:
reproject_enum = resampling_enum = cls._get_resampling_enum("nearest")
else:
reproject_enum = cls._get_resampling_enum(reprojection_method)
resampling_enum = cls._get_resampling_enum(resampling_method)
with contextlib.ExitStack() as es:
es.enter_context(rasterio.Env(**cls._RIO_ENV_KEYS))
try:
src = es.enter_context(rasterio.open(path))
except OSError:
raise IOError("error while reading file {}".format(path))
# compute buonds in target CRS
dst_bounds = warp.transform_bounds(src.crs, cls._TARGET_CRS, *src.bounds)
if tile_bounds is None:
tile_bounds = dst_bounds
# prevent loads of very sparse data
cover_ratio = (
(dst_bounds[2] - dst_bounds[0])
/ (tile_bounds[2] - tile_bounds[0])
* (dst_bounds[3] - dst_bounds[1])
/ (tile_bounds[3] - tile_bounds[1])
)
if cover_ratio < 0.01:
raise exceptions.TileOutOfBoundsError("dataset covers less than 1% of tile")
# compute suggested resolution in target CRS
dst_transform, _, _ = warp.calculate_default_transform(
src.crs, cls._TARGET_CRS, src.width, src.height, *src.bounds
)
dst_res = (abs(dst_transform.a), abs(dst_transform.e))
# make sure VRT resolves the entire tile
tile_transform = transform.from_bounds(*tile_bounds, *tile_size)
tile_res = (abs(tile_transform.a), abs(tile_transform.e))
if tile_res[0] < dst_res[0] or tile_res[1] < dst_res[1]:
dst_res = tile_res
resampling_enum = cls._get_resampling_enum("nearest")
# pad tile bounds to prevent interpolation artefacts
num_pad_pixels = 2
# compute tile VRT shape and transform
dst_width = max(1, round((tile_bounds[2] - tile_bounds[0]) / dst_res[0]))
dst_height = max(1, round((tile_bounds[3] - tile_bounds[1]) / dst_res[1]))
vrt_transform = transform.from_bounds(
*tile_bounds, width=dst_width, height=dst_height
) * Affine.translation(-num_pad_pixels, -num_pad_pixels)
vrt_height, vrt_width = (
dst_height + 2 * num_pad_pixels,
dst_width + 2 * num_pad_pixels,
)
# remove padding in output
out_window = windows.Window(
col_off=num_pad_pixels,
row_off=num_pad_pixels,
width=dst_width,
height=dst_height,
)
# construct VRT
vrt = es.enter_context(
WarpedVRT(
src,
crs=cls._TARGET_CRS,
resampling=reproject_enum,
transform=vrt_transform,
width=vrt_width,
height=vrt_height,
add_alpha=not cls._has_alpha_band(src),
)
)
# read data
with warnings.catch_warnings():
warnings.filterwarnings("ignore", message="invalid value encountered.*")
tile_data = vrt.read(
1, resampling=resampling_enum, window=out_window, out_shape=tile_size
)
# assemble alpha mask
mask_idx = vrt.count
mask = vrt.read(mask_idx, window=out_window, out_shape=tile_size) == 0
if src.nodata is not None:
mask |= tile_data == src.nodata
r = tile_data
IPython.embed()
return np.ma.masked_array(tile_data, mask=mask)
# return type has to be Any until mypy supports conditional return types
@requires_connection
def get_raster_tile(
self,
keys: Union[Sequence[str], Mapping[str, str]],
*,
tile_bounds: Sequence[float] = None,
tile_size: Sequence[int] = None,
preserve_values: bool = False,
asynchronous: bool = False,
) -> Any:
# This wrapper handles cache interaction and asynchronous tile retrieval.
# The real work is done in _get_raster_tile.
future: Future[np.ma.MaskedArray]
result: np.ma.MaskedArray
settings = get_settings()
key_tuple = tuple(self._key_dict_to_sequence(keys))
datasets = self.get_datasets(dict(zip(self.key_names, key_tuple)))
assert len(datasets) == 1
path = datasets[key_tuple]
if tile_size is None:
tile_size = settings.DEFAULT_TILE_SIZE
# make sure all arguments are hashable
kwargs = dict(
path=path,
tile_bounds=tuple(tile_bounds) if tile_bounds else None,
tile_size=tuple(tile_size),
preserve_values=preserve_values,
reprojection_method=settings.REPROJECTION_METHOD,
resampling_method=settings.RESAMPLING_METHOD,
)
cache_key = hash(tuple(kwargs.items()))
try:
with self._cache_lock:
result = self._raster_cache[cache_key]
except KeyError:
pass
else:
if asynchronous:
# wrap result in a future
future = Future()
future.set_result(result)
return future
else:
return result
retrieve_tile = functools.partial(_get_raster_tile, **kwargs)
future = submit_to_executor(retrieve_tile)
def cache_callback(future: Future) -> None:
# insert result into global cache if execution was successful
if future.exception() is None:
self._add_to_cache(cache_key, future.result())
if asynchronous:
future.add_done_callback(cache_callback)
return future
else:
result = future.result()
cache_callback(future)
return result
def get_tile_data(
driver: Driver,
keys: Union[Sequence[str], Mapping[str, str]],
tile_xyz: Tuple[int, int, int] = None,
*,
tile_size: Tuple[int, int] = (256, 256),
preserve_values: bool = False,
asynchronous: bool = False,
) -> Any:
"""Retrieve raster image from driver for given XYZ tile and keys"""
if tile_xyz is None:
# read whole dataset
return get_raster_tile(
keys,
tile_size=tile_size,
preserve_values=preserve_values,
asynchronous=asynchronous,
)
# determine bounds for given tile
metadata = driver.get_metadata(keys)
wgs_bounds = metadata["bounds"]
tile_x, tile_y, tile_z = tile_xyz
if not xyz.tile_exists(wgs_bounds, tile_x, tile_y, tile_z):
raise exceptions.TileOutOfBoundsError(
f"Tile {tile_z}/{tile_x}/{tile_y} is outside image bounds"
)
mercator_tile = mercantile.Tile(x=tile_x, y=tile_y, z=tile_z)
target_bounds = mercantile.xy_bounds(mercator_tile)
return get_raster_tile(
keys,
tile_bounds=target_bounds,
tile_size=tile_size,
preserve_values=preserve_values,
asynchronous=asynchronous,
)
def compute(
expression: str,
some_keys: Sequence[str],
operand_keys: Mapping[str, str],
stretch_range: Tuple[Number, Number],
tile_xyz: Tuple[int, int, int] = None,
*,
colormap: str = None,
tile_size: Tuple[int, int] = None,
) -> BinaryIO:
"""Return singleband image computed from one or more images as PNG
Expects a Python expression that returns a NumPy array. Operands in
the expression are replaced by the images with keys as defined by
some_keys (all but the last key) and operand_keys (last key).
Contrary to singleband and rgb handlers, stretch_range must be given.
Example:
>>> operands = {
... 'v1': 'B08',
... 'v2': 'B04'
... }
>>> compute('v1 * v2', ['S2', '20171101'], operands, [0, 1000])
<binary image containing product of bands 4 and 8>
"""
from terracotta.expressions import evaluate_expression
if not stretch_range[1] > stretch_range[0]:
raise exceptions.InvalidArgumentsError(
"Upper stretch bounds must be larger than lower bounds"
)
settings = get_settings()
if tile_size is None:
tile_size_ = settings.DEFAULT_TILE_SIZE
else:
tile_size_ = tile_size
driver = get_driver(settings.DRIVER_PATH, provider=settings.DRIVER_PROVIDER)
with driver.connect():
key_names = driver.key_names
if len(some_keys) != len(key_names) - 1:
raise exceptions.InvalidArgumentsError(
"must specify all keys except last one"
)
def get_band_future(band_key: str) -> Future:
band_keys = (*some_keys, band_key)
return get_tile_data(
driver,
band_keys,
tile_xyz=tile_xyz,
tile_size=tile_size_,
asynchronous=True,
)
futures = {var: get_band_future(key) for var, key in operand_keys.items()}
operand_data = {var: future.result() for var, future in futures.items()}
# in my mind this would be the ideal spot to do final processing (if the processing is quicker than reading data....) since the xyz data can be cached
# but we are missing the spatial context of the data at this point
r = operand_data["v1"]
zonal_extract, _transform = rasterio.mask.mask(r, g, crop=True, filled=False)
IPython.embed()
try:
out = evaluate_expression(expression, operand_data)
except ValueError as exc:
# make sure error message gets propagated
raise exceptions.InvalidArgumentsError(
f"error while executing expression: {exc!s}"
)
out = image.to_uint8(out, *stretch_range)
return image.array_to_png(out, colormap=colormap)
if __name__ == "__main__":
# from server import app
# app.run(port=5000, host="localhost", threaded=False, debug=True)
update_settings(
DRIVER_PATH=DB_PATH,
REPROJECTION_METHOD="nearest",
DEFAULT_TILE_SIZE=[512, 512]
# DEBUG=True,
)
db = terracotta.get_driver(DB_PATH, provider="sqlite")
with db.connect():
inspect(
compute(
"v1 * 1",
[],
{"v1": "India-v16-2020"},
[0, 100],
)
) |
If I wanted to leverage some of the existing expression parsing of the compute route to create a route which will create a CSV extract from compute route parameters + a GeoJSON. What would be the best way to go about that?
similar to #220 but 220 is about xyz input, vector tiles output whereas this ticket is about bbox extent or shapely geometry input and output of the desired terracotta function as ndarray (and affine?) OR querying a evaluate_expression output image as a VRT (across multiple tiles)
The text was updated successfully, but these errors were encountered: