-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geosampler prechipping #2300
base: main
Are you sure you want to change the base?
Geosampler prechipping #2300
Conversation
@microsoft-github-policy-service agree company="Shell" |
I still need to add geopandas to the requirements. Do you use any dependency resolver for adding the right version or just simply find the version that is installed for python 3.10? |
The downside of this approach is that every epoch will contain the exact same set of images, whereas the current implementation samples different locations during each epoch. Not a deal breaker, but something to weigh against the proposed benefits:
You could also just call
This is a valid advantage, and something we would like to support (one way or another).
Is there something wrong with our builtin splitters? The problem with your splitter is that there's no guarantee that two samples do not overlap across train and test. P.S. Missing dependencies on geopandas and tqdm. Would need to be added to |
Agree with that and that is a good point. Maybe I could add a
I indeed use this for visualizing a single sample, but more to inspect the underlying data rather than the location of this sample on a map. The point of saving the chips is to get a feeling for the complete set of samples that are generated. Especially in combination with multiple filtering steps during inference, I like the fact that I can doublecheck the areas that get inferred.
Nothing wrong with the existing splitters, although they split datasets rather than the individual samples. The existing functions are desirable over what I have implemented, I added this functionality more for inference, so that I can split my inference area across multiple workers. In theory I could also assign a dataset per worker, but I like the fact that right now I have one dataset and one sampler. on the P.S: curious if you have any particular reason not to go for something like Poetry to handle package compatibility? |
Haven't seen a good reason to use it. Someone asks me why I'm not using flit, poetry, hatchling, etc. a couple times per year. But so far, I haven't seen any features that they add that setuptools doesn't have. But let's discuss that in a separate discussion. |
So finally all checks are passing :) I added the |
Will |
I have now overridden the |
One additional thing: In order to pass the notebooks tests, I changed the workflow file to install the checked out repo. By default, torchgeo is installed as one of the cells of the first notebooks, but that only installs torchgeo from the main. In that way no notebook would pass as part of a PR. Agree with installing torchgeo explicitly in my current way? |
Can you open a separate PR for this? I would like to get this in soon, as it's also needed by @calebrob6 in #1897. Note that we need to keep |
See #2306 |
I have not forgotten about this PR, just haven't had time to properly review it. |
I think it makes sense to be able to interact with the chips of each sampler. Therefore, I want to propose changing the creation of the chips in the
__init__
of each sampler, and to keep track of the chips using a GeoDataFrame. This leads to the following benefits:filter_chips
.set_worker_split
.An abstract method needs to be implemented for every sampler:
Let me know what you think!