RandomGeoSampler vs RandomBatchGeoSampler #1751
-
Hi again, Considering the dataset I'm using, what should I use to define my dataset? So far this is my definition:
The prints of image_set and gt_masks are these:
And for some reason, after making the intersection dataset, the labels have their CRS converted:
Should I use RandomGeoSampler or RandomBatchGeoSampler and why? |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 9 replies
-
Hi @lcoandrade, not sure if you were aware, but we do have a dataset and datamodule already available for the Inria AIL dataset. See |
Beta Was this translation helpful? Give feedback.
-
Hi, @isaaccorley. Thanks to @adamjstewart , I know that. I'm using it as a custom dataset because I'm making a comparative study between Torchgeo and Rastervision with my students. I already showed them how to use a custom dataset on RV and now I'll show them how to perform the same work on TG. The idea is to use our own custom dataset later. I'm asking this because my training batches are quite smaller than the validation. This generates bad results after training. Here you can see. Why is this happening? |
Beta Was this translation helpful? Give feedback.
-
Other thing that is bothering me. While training, I'm getting:
|
Beta Was this translation helpful? Give feedback.
-
Changing to GridGeoSampler to train, I get 15k batches, which is expected. |
Beta Was this translation helpful? Give feedback.
-
See Figure 2 from our paper. Your datasets have different CRSs. If you don't reproject them to the same CRS, you won't be able to align them. Luckily, TorchGeo reprojects them for you automatically when you merge the two datasets. You can do this implicitly: dataset_a & dataset_b # warps b to a or explicitly: dataset_b = MyRasterMask(..., crs=...)
See Figure 3a from our paper. RandomGeoSampler samples a random patch from a random file for each sample in each mini-batch. RandomBatchGeoSampler works very similarly, but instead samples all random patches from the same random file during the entire mini-batch. The result is that GDAL's LRU cache is more likely to be hit for larger mini-batches, smaller files, or larger block sizes. You should find that RandomBatchGeoSampler is slightly faster for file I/O than RandomGeoSampler, although that may not matter if your I/O is fast and GPU is slow. |
Beta Was this translation helpful? Give feedback.
-
Many thanks to @isaaccorley and @adamjstewart for the kind and precise answers. Now, I understand better how TG works. I consider that I have many answers here in the discussion. |
Beta Was this translation helpful? Give feedback.
See Figure 2 from our paper. Your datasets have different CRSs. If you don't reproject them to the same CRS, you won't be able to align them. Luckily, TorchGeo reprojects them for you automatically when you merge the two datasets. You can do this implicitly:
or explicitly:
See Figure 3a from our paper. RandomGeoSampler samples a random patch from a random file for each sample in each mini-batch. RandomBatchGeoSampler works very similarly, but instead s…