-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add instance segmentation support #854
Comments
Sounds good! In the future we might want to generalize this to handle more than one category, but this is a good simplifying assumption to get started. I would try to get each command working in the order in which they run (chip, train, etc). Getting some debug chips made at the beginning of the training step will be a good sanity check. One slightly tricky issue you need to deal with when working with instance based methods (like object detection and instance segmentation) is how to deal with instances that straddle chip boundaries during prediction. In RV, we use a sliding window to be able to make predictions over large images. For object detection, we use a sliding window with 50% overlap and make the chip size > the instance size. This ensures that each instance is glimpsed in its entirety by some window. We then do a de-duplication step to remove any instances that were predicted by multiple windows. However, recently, we realized that you can train a model on small chips (eg 200x200) and do a forward pass (during prediction) on large chips (1000x1000+) without any drop in accuracy (sometimes it's actually more accurate due to more context). It might be possible to just make a prediction on whole 2048x2048 images, and then you don't need to worry about this issue. |
Thanks for your feedback! Interesting point about instances straddling the boundaries of the chip, straddles will be common for my problem, as the instances (fields) are large relative to the chip and scene size. Do you have prior knowledge of how big your instances might be and set the chip size accordingly? At this point I have almost gotten the model to take a training step. I added code to build a (image, target) like |
Added the rv |
Hi @lewfish , This brought up a design question: what level of pre-processing should rv expect for instance segmentation? At this point, for the fields data I have fed it raster labels where the labels were (1, img_sz, img_sz), and each instance had a unique integer assignment > 0. This won't be ideal, because while it is convenient to pass in a single-channel label, and subsequently break it into (nb_features, img_sz, img_sz), encoding multiple instances from multiple classes would get confusing. I imagine eventually it would be ideal to pass rv a vector source and have labels rasterized with instance-aware class labels using rasterio? With COCO, I have been stacking masks as (nb_features, img_sz_x, imgsz_y), which allows me to create a separate mask for each instance, and label it with the COCO category integer. This just necessitates some modification of the chipping code, where in several places, I sum the masks over axis=2 to identify empty locations, and to save in process_scene_data, etc. Hopefully getting instance segmentation to run on COCO in rv will help me see where I can generalize the code. Cheers, |
Roughly yes. I think I've only ever used object detection for buildings and cars. |
You said you were trying to mimic semantic segmentation functionality, but now that I think about it, the object detection code is probably more relevant to what you're doing, although you probably realize that. Also, this is helpful and not sure if you saw it: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
Yes, that sounds right.
Getting things working on a non-geospatial well-known dataset is a good idea, and something I've done in the past. Just keep in mind that COCO is a really big dataset though, and it takes a lot of GPU time to train a model, so you might want to try something smaller. |
Sidenote which may be of interest: since COCO images are large most people use an 8 GPU machine where each GPU has a batch size of 2. But an alternative approach called SNIPER breaks the large images into small chips first, and then can train with a large batch size on a single GPU if desired. It's the first time I've seen this sliding window style approach applied in a non-GIS setting: https://arxiv.org/pdf/1805.09300.pdf |
Thanks for your thoughts @lewfish !
I have spent the majority of my time studying the semantic segmentation-related rv code, it will be necessary for me to study relevant object detection-related rv functionality as well.
Yes I've read lots of torchvision tutorials and discussion forums and found them quite useful. Also the multimodal learning implementation of mask rcnn has been informative, as the algorithm is relatively well exposed in pytorch. I've gotten rv to evaluate on COCO images using the COCO pretrained model. I want to set up the label output so I can visualize it and ensure that the output is sane. Then figure out why I get emtpy evaluations sometimes (advice?), why one of the five losses goes to inf (advice?), then try and transfer the pretrained backbone to my agricultural fields dataset.
Only after I convince myself that I can transfer the COCO backbone to my agriculture problem on 3 channels will I worry about training anything with fine-tuning or from scratch, let alone multispectral. I've started discussions with colleagues at NASA who are working on getting me an account with NASA Earth Exchange, which has a 32 GB GPU setup. I also have AWS money on an upcoming grant for 2020. Right now I have a RTX 2080 8GB on my research machine. Hopefully I can tap your expertise when it comes time for real training on a powerful machine. |
Hi. Any works update on implementing instance segmentation in rv? |
It's a on a list of potential new features, but I don't think it's likely that it will be implemented soon. What application were you thinking of using it for? It would require a lot of code to be written, but it should be relatively straightforward since it should just require adding various subclasses following the pattern of semantic segmentation and object detection. If you or someone else were interested in taking it on, we would be happy to provide guidance. |
Here we're working on digitising solar panels on ortho. So we have received panel annotation which they're originally annotated close to each other. Since semantic will read all annotation as one label, we find out that it's not possible to digitise each panel separately. Hence we add buffer between them by updating the annotations to cover only the inner side of the panels. But still challenging for the trained model to distinguish and digitise each panel. Some of the panels is already distinguished but somehow still a lot weren't. Hence, reading and review the concept of instance segmentation seems possible to solve this since the inference output could be done up to each panel.
That is a very good offer @lewfish! But, I still have a lot to learn since still new in this ML DL field. But, I am happy to contribute. Could you please guide me? |
Similar application as @ammarsdc. I would gladly contribute as well, given some guidance. |
Here is a rough list of the tasks that would need to be completed to add instance segmentation to RV. I've made some very rough estimates about how many days each task would take assuming you are proficient with Python and PyTorch, have some experience working on large codebases, and things go relatively smoothly. This would be a lot of work! I would start with the first part about reading and visualizing a dataset. That should give you a better idea of the approach to take and how much work the whole thing will take. It would also be a good contribution to RV even if you weren't able to complete the whole thing. Where I've listed the name of a class, the idea would be to extend it for instance segmentation, and I've linked to the corresponding class for semantic segmentation to give you an idea of what that would look like (although object detection would also be relevant). @AdeelH Please chime in if you have anything to add, and feel free to directly edit this list.
|
Hi Y'all,
I attempted to create an instance segmentation implementation back in 2020.
Perhaps my repo could help in some small way.
https://github.com/dgketchum/raster-vision/tree/instance_seg
Cheers,
…On Tue, Dec 6, 2022 at 9:38 AM Lewis Fishgold ***@***.***> wrote:
Here is a rough list of the tasks that would need to be completed to add
instance segmentation to RV. I've made some very rough estimates about how
many days each task would take assuming you are proficient with Python and
PyTorch and things go smoothly. This would be a lot of work! I would start
with the first part about reading and visualizing a dataset. That should
give you a better idea of the approach to take and how much work the whole
thing will take. It would also be a good contribution to RV even if you
weren't able to complete the whole thing. Where I've listed the name of a
class, the idea would be to extend it for instance segmentation, and I've
linked to the corresponding class for semantic segmentation to give you an
idea of what that would look like (although object detection would also be
relevant). @AdeelH <https://github.com/AdeelH> Please chime in if you
have anything to add, and feel free to directly edit this list.
-
Read and visualize instance segmentation dataset [10]
- Understand Mask-RCNN torchvision tutorial
<https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html>
[1]
- Labels
<https://github.com/azavea/raster-vision/blob/master/rastervision_core/rastervision/core/data/label/semantic_segmentation_labels.py>
[3]
- LabelSource
<https://github.com/azavea/raster-vision/blob/master/rastervision_core/rastervision/core/data/label_source/semantic_segmentation_label_source.py>
[2]
- Dataset
<https://github.com/azavea/raster-vision/blob/master/rastervision_pytorch_learner/rastervision/pytorch_learner/dataset/semantic_segmentation_dataset.py>
[2]
- Visualizer
<https://github.com/azavea/raster-vision/blob/master/rastervision_pytorch_learner/rastervision/pytorch_learner/dataset/visualizer/semantic_segmentation_visualizer.py>
and update Visualizer notebook
<https://github.com/azavea/raster-vision/blob/master/docs/usage/tutorials/visualize_data_samples.ipynb>
[2]
-
Implement full pipeline [9]
- LabelStore
<https://github.com/azavea/raster-vision/blob/master/rastervision_core/rastervision/core/data/label_store/semantic_segmentation_label_store.py>
[2]
- Evaluation
<https://github.com/azavea/raster-vision/blob/master/rastervision_core/rastervision/core/evaluation/semantic_segmentation_evaluation.py>
/ Evaluator
<https://github.com/azavea/raster-vision/blob/master/rastervision_core/rastervision/core/evaluation/semantic_segmentation_evaluator.py>
[2]
- Learner
<https://github.com/azavea/raster-vision/blob/master/rastervision_pytorch_learner/rastervision/pytorch_learner/semantic_segmentation_learner.py>
[2]
- Pipeline
<https://github.com/azavea/raster-vision/blob/master/rastervision_core/rastervision/core/rv_pipeline/semantic_segmentation.py>
[3]
-
Docs and tests will need to be updated to accept a PR [12]
- Update tutorials [2]
- Pipeline example [2]
- Update documentation [2]
- Integration test [3]
- Unit tests [3]
—
Reply to this email directly, view it on GitHub
<#854 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC4DUMVGPM2TS4E72L5HVTLWL5TZ3ANCNFSM4JNDVIAA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
David Ketchum
Hydrologist, PhD Candidate
Montana DNRC
College of Forestry and Conservation, University of Montana
Missoula, MT
c. 8022227119
|
Thanks @lewfish. What would likely be the best way to share our progress for those checklist? |
You can make a PR with your work in progress code and get feedback if you'd like. There's a draft mode you can put a PR in. Any higher-level conceptual discussion can go in this issue. |
It would be helpful to have access to the torchvision.models.detection.maskrcnn_resnet50_fpn via raster-vision.
Hi @lewfish
I've been slowly working through the code in my fork, adding modules that mimic the semantic segmentation functionality.
At this point I'm planning on using 2048x2048 NAIP images as the training source, on the RGB. I have 2048x2048 lables where I've rasterized the overlying vector data (agricultural fields), each feature a new 'instance' with an integer instance label, from 1 to number_features, background is 0.
I think within RV I'll split the instance-encoded image into binary masks, one for each feature, and get the bounding box from each as they do in this torchvision tutorial. Then on to prepare the (image, {boxes, labels, masks, id, area}) for the DataLoader in data.build_databunch(), as expected by torchvision.models.detection.maskrcnn_resnet50_fpn.
Hope this seems reasonable; if so I'll just leave a running commentary on what I'v done on this issue thread. Any input from you will be greatly appreciated.
This is by far the most sophisticated project I've touched. Fun to learn though.
The text was updated successfully, but these errors were encountered: