Geosampler prechipping #2300

sfalkena · 2024-09-17T13:48:50Z

I think it makes sense to be able to interact with the chips of each sampler. Therefore, I want to propose changing the creation of the chips in the __init__ of each sampler, and to keep track of the chips using a GeoDataFrame. This leads to the following benefits:

It is possible to save the chips to file and to visualize them either in GIS software, or while developing.
It allows to filter the chips using other geometry, to remove chips that are not of interest. This could be particularly useful during inference to limit the number of chips. Example: When I want to infer a building detection model, I can now instantiate a gridsampler, filter my chips using a shapefile of urban areas and only infer on the intersection of my chips with the urban areas. See the function filter_chips.
It allows to split the samples. Right now I have created a function to split the samples to spread them across multiple workers. See the function set_worker_split.

An abstract method needs to be implemented for every sampler:

    @abc.abstractmethod    
    def get_chips(self) -> GeoDataFrame:

Let me know what you think!

…rywhere

sfalkena · 2024-09-17T13:52:49Z

@sfalkena please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

@microsoft-github-policy-service agree company="Shell"

sfalkena · 2024-09-17T14:02:46Z

I still need to add geopandas to the requirements. Do you use any dependency resolver for adding the right version or just simply find the version that is installed for python 3.10?

adamjstewart · 2024-09-17T14:08:47Z

The downside of this approach is that every epoch will contain the exact same set of images, whereas the current implementation samples different locations during each epoch. Not a deal breaker, but something to weigh against the proposed benefits:

It is possible to save the chips to file and to visualize them either in GIS software, or while developing.

You could also just call dataset.plot(sample) and visualize them without export. Are you seeing any issues with this? Would like to simplify this if so.

It allows to filter the chips using other geometry, to remove chips that are not of interest.

This is a valid advantage, and something we would like to support (one way or another).

It allows to split the samples.

Is there something wrong with our builtin splitters? The problem with your splitter is that there's no guarantee that two samples do not overlap across train and test.

P.S. Missing dependencies on geopandas and tqdm. Would need to be added to pyproject.toml, requirements/required.txt, and requirements/min-reqs.old. I usually use a binary search on PyPI versions until I find the minimum version that works for us.

sfalkena · 2024-09-17T15:14:20Z

The downside of this approach is that every epoch will contain the exact same set of images, whereas the current implementation samples different locations during each epoch. Not a deal breaker, but something to weigh against the proposed benefits:

Agree with that and that is a good point. Maybe I could add a refresh_samples function to the random sampler that samples a new set? This function could be called manually every epoch.

You could also just call dataset.plot(sample) and visualize them without export. Are you seeing any issues with this? Would like to simplify this if so.

I indeed use this for visualizing a single sample, but more to inspect the underlying data rather than the location of this sample on a map. The point of saving the chips is to get a feeling for the complete set of samples that are generated. Especially in combination with multiple filtering steps during inference, I like the fact that I can doublecheck the areas that get inferred.

Is there something wrong with our builtin splitters? The problem with your splitter is that there's no guarantee that two samples do not overlap across train and test.

Nothing wrong with the existing splitters, although they split datasets rather than the individual samples. The existing functions are desirable over what I have implemented, I added this functionality more for inference, so that I can split my inference area across multiple workers. In theory I could also assign a dataset per worker, but I like the fact that right now I have one dataset and one sampler.

on the P.S: curious if you have any particular reason not to go for something like Poetry to handle package compatibility?

adamjstewart · 2024-09-17T16:41:22Z

curious if you have any particular reason not to go for something like Poetry to handle package compatibility?

Haven't seen a good reason to use it. Someone asks me why I'm not using flit, poetry, hatchling, etc. a couple times per year. But so far, I haven't seen any features that they add that setuptools doesn't have. But let's discuss that in a separate discussion.

sfalkena · 2024-09-18T13:59:46Z

So finally all checks are passing :) I added the refresh_samples method too. Please let me know if you want to see any other changes.

adamjstewart · 2024-09-18T14:06:11Z

Will refresh_samples be run automatically when using a DataLoader/DataModule? This doesn't really seem like a good solution to the problem.

sfalkena · 2024-09-18T15:02:33Z

I have now overridden the __iter__ method specifically for the RandomGeoSampler, so indeed chips get refreshed every epoch. Additionally, added a tutorial notebook on how to visualize the samples.

sfalkena · 2024-09-18T19:54:46Z

One additional thing: In order to pass the notebooks tests, I changed the workflow file to install the checked out repo. By default, torchgeo is installed as one of the cells of the first notebooks, but that only installs torchgeo from the main. In that way no notebook would pass as part of a PR. Agree with installing torchgeo explicitly in my current way?

adamjstewart · 2024-09-20T09:43:06Z

One additional thing: In order to pass the notebooks tests, I changed the workflow file to install the checked out repo. By default, torchgeo is installed as one of the cells of the first notebooks, but that only installs torchgeo from the main. In that way no notebook would pass as part of a PR. Agree with installing torchgeo explicitly in my current way?

Can you open a separate PR for this? I would like to get this in soon, as it's also needed by @calebrob6 in #1897. Note that we need to keep -r requirements.txt to ensure that the versions are fixed. I also don't think -e is needed, it should just be the ..

sfalkena · 2024-09-20T11:22:32Z

See #2306

…/torchgeo into geosampler_prechipping

…nto random_seed

Add random generator

This reverts commit c7f3e4c, reversing changes made to d8cb4b2.

adamjstewart · 2024-10-01T20:19:34Z

I have not forgotten about this PR, just haven't had time to properly review it.

sfalkena added 4 commits September 17, 2024 12:11

Move VERS samplers into torchgeo samplers, implement pre-chipping eve…

27cc576

…rywhere

revert return_as_ts

99a16ae

Pass ruff and tests 100%

a158e0b

run prettier on landcoverai

77901fc

github-actions bot added testing Continuous integration testing samplers Samplers for indexing datasets labels Sep 17, 2024

Merge branch 'main' into geosampler_prechipping

8d1ee54

sfalkena mentioned this pull request Sep 17, 2024

Geosampler prechipping sede-open/torchgeo#4

Merged

sfalkena added 3 commits September 17, 2024 21:36

Merge branch 'main' into geosampler_prechipping

28e61e4

add refresh_samples function

60539ad

Add dependencies, add test for refresh

bc3300a

github-actions bot added the dependencies Packaging and dependencies label Sep 18, 2024

sfalkena added 4 commits September 18, 2024 09:30

fix typo

83411f4

fix datamodules failing test, better test for resampling

6fba6cc

ruff

c9df4e4

Documentation updates, try to add geopandas

eaf22dc

github-actions bot added the documentation Improvements or additions to documentation label Sep 18, 2024

sfalkena added 2 commits September 18, 2024 14:53

add GeoDataFrame to nitpick ignore

5a554ff

remove explicit GeoDataFrame return value

9e0627f

automatically shuffle every __iter__. Add tutorial notebook.

fb282fe

sfalkena added 2 commits September 18, 2024 17:26

add notebook to docs, change some notebook cells.

c294519

add debug statement

26da498

sfalkena added 2 commits September 18, 2024 21:35

Installing torchgeo as part of workflow to avoid installing master

989e479

remove required.txt from workflow

aa49753

adamjstewart mentioned this pull request Sep 20, 2024

Added custom semantic segmentation trainer tutorial #1897

Open

sfalkena mentioned this pull request Sep 20, 2024

GH actions tutorials - Install TorchGeo from checked out repo #2306

Merged

sfalkena and others added 11 commits September 20, 2024 13:39

restore workflow

7a2e079

Merge branch 'main' into geosampler_prechipping

59a22fb

allow later versions of geopandas

20b643a

Merge branch 'geosampler_prechipping' of https://github.com/sede-open…

d8cb4b2

…/torchgeo into geosampler_prechipping

Add random generator

494fbd7

Merge branch 'main' into random_seed

af0ffaa

Add tests for seed

5a9e107

Merge branch 'random_seed' of https://github.com/sede-open/torchgeo i…

e987fce

…nto random_seed

pass generator every sampler

46e1f11

Merge pull request #6 from sede-open/random_seed

bfe635a

Add random generator

Merge branch 'vers_working_branch' into geosampler_prechipping

c7f3e4c

github-actions bot added the datamodules PyTorch Lightning datamodules label Sep 23, 2024

Revert "Merge branch 'vers_working_branch' into geosampler_prechipping"

c51a63b

This reverts commit c7f3e4c, reversing changes made to d8cb4b2.

github-actions bot removed the datamodules PyTorch Lightning datamodules label Sep 23, 2024

sfalkena added 3 commits September 24, 2024 09:11

Merge remote-tracking branch 'torchgeo/main' into geosampler_prechipping

25ce0e1

Merge branch 'main' into geosampler_prechipping

e7ecb86

Merge branch 'main' into geosampler_prechipping

5347560

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Geosampler prechipping #2300

Geosampler prechipping #2300

sfalkena commented Sep 17, 2024

sfalkena commented Sep 17, 2024

sfalkena commented Sep 17, 2024

adamjstewart commented Sep 17, 2024

sfalkena commented Sep 17, 2024

adamjstewart commented Sep 17, 2024

sfalkena commented Sep 18, 2024

adamjstewart commented Sep 18, 2024

sfalkena commented Sep 18, 2024 •

edited

Loading

sfalkena commented Sep 18, 2024

adamjstewart commented Sep 20, 2024

sfalkena commented Sep 20, 2024

adamjstewart commented Oct 1, 2024

Geosampler prechipping #2300

Are you sure you want to change the base?

Geosampler prechipping #2300

Conversation

sfalkena commented Sep 17, 2024

sfalkena commented Sep 17, 2024

sfalkena commented Sep 17, 2024

adamjstewart commented Sep 17, 2024

sfalkena commented Sep 17, 2024

adamjstewart commented Sep 17, 2024

sfalkena commented Sep 18, 2024

adamjstewart commented Sep 18, 2024

sfalkena commented Sep 18, 2024 • edited Loading

sfalkena commented Sep 18, 2024

adamjstewart commented Sep 20, 2024

sfalkena commented Sep 20, 2024

adamjstewart commented Oct 1, 2024

sfalkena commented Sep 18, 2024 •

edited

Loading