Custom disaster-based train/test splits for xView2 dataset #2416

burakekim · 2024-11-18T16:12:37Z

XView2DistShift is a subclass of XView2 designed to modify the original train/test splits. Similar to EuroSATSpatial #2074, this class enables domain adaptation and out-of-distribution (OOD) detection experiments.

From the docstring:

This class allows for the selection of particular disasters to be used as the training set (in-domain) and test set (out-of-domain). The dataset can be split according to the disaster names specified by the user, enabling the model to train on one disaster type and evaluate on a different, out-of-domain disaster. The goal is to test the generalization ability of models trained on one disaster to perform on another.

TODO: test coverage

adamjstewart · 2024-11-18T19:35:41Z

We decided on EuroSAT Spatial before, why switch to XView2 Dist Shift now? Will there be any corresponding citations for these new splits?

It would be nice to move more of the shared code in the XView2 base class so that the only thing that needs to be changed in this subclass is the URLs. How different are these datasets?

burakekim · 2024-11-18T20:29:41Z

We decided on EuroSAT Spatial before, why switch to XView2 Dist Shift now?

Spatial refers to the type of distribution shift revealed by the splits when they are rearranged. XView2, consists of multiple disasters, and the distribution shift is determined by the user's choice. The user can select any disaster as the training set and another as the test set -- which introduces varying types of distribution shifts. These shifts range from near-distribution shifts to far-distribution shifts, depending on how different the disasters in the splits are. And here, the difference is not limited to spatial factors but also includes temporal and contextual differences. That is why, Spatial would be a misleading naming for XView2

One alternative could be standardizing the naming for these subset datasets with a suffix like OOD or DistShift. What do you think?

It would be nice to move more of the shared code in the XView2 base class so that the only thing that needs to be changed in this subclass is the URLs. How different are these datasets?

They are basically the same dataset but with different splits. XView2DistShift allows users to select specific disasters for training and testing sets.

Are you suggesting we curate the filenames for all disasters as HF links and dynamically load them as training or testing sets based on input? This approach would save us from _initialize_files and _load_split_files_by_disaster_and_type, not __getitem__ and __len__

burakekim and others added 5 commits April 9, 2024 15:15

minor typo in custom_raster_dataset.ipynb

da2399b

Merge branch 'main' of https://github.com/burakekim/torchgeo

9791f12

xview2distshift dataset

62919bf

test xview2

5985f44

formatting

a23344e

github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets testing Continuous integration testing labels Nov 18, 2024

adamjstewart modified the milestones: 0.6.2, 0.7.0 Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom disaster-based train/test splits for xView2 dataset #2416

Custom disaster-based train/test splits for xView2 dataset #2416

burakekim commented Nov 18, 2024

adamjstewart commented Nov 18, 2024

burakekim commented Nov 18, 2024 •

edited

Loading

Custom disaster-based train/test splits for xView2 dataset #2416

Are you sure you want to change the base?

Custom disaster-based train/test splits for xView2 dataset #2416

Conversation

burakekim commented Nov 18, 2024

adamjstewart commented Nov 18, 2024

burakekim commented Nov 18, 2024 • edited Loading

burakekim commented Nov 18, 2024 •

edited

Loading