Skip to content
Adam J. Stewart edited this page Jun 25, 2021 · 15 revisions

Datasets

There are many different ways in which we can classify our datasets. This classification allows us to create abstract base classes to ensure a uniform API for all subclasses.

Benchmark vs. Generic

  1. Benchmark: contains both images and targets (e.g. COWC, VHR-10, CV4A Kenya)
  2. Generic: contains only images or targets (e.g. Landsat, Sentinel, CDL, Chesapeake)

The problem with this classification is that we want to be able to combine two "generic" datasets to get a single "benchmark" dataset. For example, we need a way for users to specify an image source (e.g. Landsat, Sentinel) and a target source (e.g. CDL, Chesapeake). It isn't yet clear how one would do this.

Image vs. Target

  1. Image: contains raw images
  2. Target: contains ground truth targets

This makes it easy to combine "image" and "target" datasets into a single supervised learning problem, but what about datasets that contain both images and targets? Do we want to allow people to swap image or target sources in these kind of datasets?

Chip vs. Tile vs. Region

  1. Chip: pre-defined chips/patches (e.g. COWC, VHR-10, DOTA)
  2. Tile: possibly-overlapping tiles we need to sample chips/patches from (e.g. Landsat, Sentinel, CV4A Kenya)
  3. Region: static maps of stitched-together data (e.g. CDL, Chesapeake Land Cover, static Google Earth imagery)

Again, we need to be able to combine datasets from different categories into a single data loader.

Geo vs non-Geo

  1. Geospatial: contains lat/lon information, time is optional (e.g. Landsat, CDL, etc.)
  2. Non-geospatial: no lat/lon information, only pre-defined images/targets or train-test splits

Any kind of geospatial dataset can be combined with another. Doesn't matter if they use chips/tiles/regions, as long as we have the lat/lon info.

Non-geospatial datasets cannot be combined with each other.

Transforms

Torchvision uses PIL, which isn't compatible with multi-spectral imagery. Although some of our imagery isn't multi-spectral, we don't want to have to implement the same transforms for every possible data structure. Instead, we should probably standardize on torch Tensors. This also has the benefit that transforms can be run on the GPU. Does this mean we need to use nn.Module? See https://discuss.pytorch.org/t/state-of-the-art-for-torchvision-datasets-transforms-models-design/123625 for discussion on this.

Models

Radiant Earth + Planetary Computer is planning to distribute pre-trained models, we should support these.

Clone this wiki locally