Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a PR that is part of TorchGeo Timeseries support (#2382 - Return time series). The goal is to provide an interface that allows for the following options for querying:
sample = dataset[bbox]
sample = dataset[[bbox1, bbox2, bbox3]]
sample = dataset[([dataset1_bbox1], [dataset2_bbox1, dataset2_bbox2])]
The idea is that anything within a single bbox get's merged into a single raster, and that any query with multiple bboxes will stack along the time dimension. Querying with a tuple of (iterable) bboxes would split the subqueries to different datasets.
As long as this PR is in draft mode, I will keep
test.ipynb
for anyone interested in helping to try out the new method.Current status: The first two ways of querying has been implemented for the RasterDataset and tested with a small example.
What has changed:
__init__
and__get_item__
methods.try_set_metadata, _get_bounds, _compile_and_check_filename_regex
and_populate_index
are examples of this.__merge_single_bbox
, where the biggest difference is that I keep track of a dataframe for all regex metadata instead oflists
anddicts
. The main reason is that it allows to group filepaths per band easily, as well of keeping track of which dates went into which merged raster. More about that later.__merge_query
. We need to agree if we want to go for timedimension 1 with non-temporal datasets.[[bbox1_t1, bbox1_t2, ...], [bbox2_t1]]
. I chose datetime format for now since that worked well for me in practice, but this can be converted to any format by the transforms. I was thinking that instead of a list of dates, maybe we could return a daterange or something, but that mostly depends on the downstream use.filename_glob
has been relaxed, so that all files/bands end up in the dataset index.nodata_value
has been added, anddrop_nodata
has been added to theinit
of the class. Setting the value to True will ignore any merged raster that contains nodata values. This came from using the class in practice with Sentinel2, and seeing that some timestamps contained black (parts of) imagery, since some sentinel tiles are not square. In theory, this could become a separate PR, but I chose to add it here, because the effect of nodata pixels becomes more pronounced with timeseries.What is still left to do: