-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Architecture for Human Segmentation? #10
Comments
It would be a good idea to start with deeplab model for full person.Try out the sample 21 class model trained with pascal_coco in tensorflow webiste which already contains the person class. If you want to train on your own data, use a high resolution 513x513 input and depth multiplier 1(or more) for highest accuracy. This may increase the overall inference time; but since you are striving for highest-accuracy you need to make some trade-offs for speed. Supervisely dataset does not have proper masks for connected objects(to person). Try removing such data from supervisely and use them in combination with pascal/coco person datasets. Sometimes it seems to be ambiguous regarding inclusion of connected objects(i. is it a connected object or an object at back-side partially occluded by person?). In any case, you need to ensure you have sufficient number of images (wit/without connected) for training, as per your specific use-case. I have not tried any other techniques for the including the connected objects. |
Many thanks for the detailed answer,
I had a few questions : A) What is the license of your amazing library? Thanks! |
Thanks for the amazing library!
I am looking to implement high-quality semantic segmentation on a mobile device for human cutout (full body).
What architecture/encoder would be a good choice for the task at hand? MobileNetV2, MobileNetV3, DeeplabV3+, ShuffleNet, PortraitNet, SINet.... There are so many, its confusing....
https://github.com/qubvel/segmentation_models.pytorch
I wanted highest-acccuracy, rather than smallest or fastest
In the final output mask, how can I even get the objects that a person is holding, say a cup, a purse, a tennis racquet, a balloon, a toy, a magazine. It could be just about anything.
I am very much perplexed with this problem.
For training of human segmentation, I was planning to use the Supervisely Person dataset. If I am not mistaken, the Supervisely dataset doesn't contain masks for objects that the person might be holding. To achieve this, would a dataset like Supervisely be unfit for the job? Or we need to train on a dataset with more labels than just "person"?
But ideally, if an object is lying on the side, it is ok if it does not come in the mask. But if the person is holding the object, it should definitely come in the final mask.
How can this be achieved?
Thanks!
The text was updated successfully, but these errors were encountered: