Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom disaster-based train/test splits for xView2 dataset #2416

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

burakekim
Copy link
Contributor

cc: @calebrob6

XView2DistShift is a subclass of XView2 designed to modify the original train/test splits. Similar to EuroSATSpatial #2074, this class enables domain adaptation and out-of-distribution (OOD) detection experiments.

From the docstring:

This class allows for the selection of particular disasters to be used as the training set (in-domain) and test set (out-of-domain). The dataset can be split according to the disaster names specified by the user, enabling the model to train on one disaster type and evaluate on a different, out-of-domain disaster. The goal is to test the generalization ability of models trained on one disaster to perform on another.

TODO: test coverage

@github-actions github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets testing Continuous integration testing labels Nov 18, 2024
@adamjstewart
Copy link
Collaborator

We decided on EuroSAT Spatial before, why switch to XView2 Dist Shift now? Will there be any corresponding citations for these new splits?

It would be nice to move more of the shared code in the XView2 base class so that the only thing that needs to be changed in this subclass is the URLs. How different are these datasets?

@adamjstewart adamjstewart modified the milestones: 0.6.2, 0.7.0 Nov 18, 2024
@burakekim
Copy link
Contributor Author

burakekim commented Nov 18, 2024

We decided on EuroSAT Spatial before, why switch to XView2 Dist Shift now?

Spatial refers to the type of distribution shift revealed by the splits when they are rearranged. XView2, consists of multiple disasters, and the distribution shift is determined by the user's choice. The user can select any disaster as the training set and another as the test set -- which introduces varying types of distribution shifts. These shifts range from near-distribution shifts to far-distribution shifts, depending on how different the disasters in the splits are. And here, the difference is not limited to spatial factors but also includes temporal and contextual differences. That is why, Spatial would be a misleading naming for XView2

One alternative could be standardizing the naming for these subset datasets with a suffix like OOD or DistShift. What do you think?

It would be nice to move more of the shared code in the XView2 base class so that the only thing that needs to be changed in this subclass is the URLs. How different are these datasets?

They are basically the same dataset but with different splits. XView2DistShift allows users to select specific disasters for training and testing sets.

Are you suggesting we curate the filenames for all disasters as HF links and dynamically load them as training or testing sets based on input? This approach would save us from _initialize_files and _load_split_files_by_disaster_and_type, not __getitem__ and __len__

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants