Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Architecture for Human Segmentation? #10

Open
InternetMaster1 opened this issue Apr 29, 2020 · 2 comments
Open

Architecture for Human Segmentation? #10

InternetMaster1 opened this issue Apr 29, 2020 · 2 comments

Comments

@InternetMaster1
Copy link

InternetMaster1 commented Apr 29, 2020

Thanks for the amazing library!

I am looking to implement high-quality semantic segmentation on a mobile device for human cutout (full body).

  1. Architecture?

What architecture/encoder would be a good choice for the task at hand? MobileNetV2, MobileNetV3, DeeplabV3+, ShuffleNet, PortraitNet, SINet.... There are so many, its confusing....
https://github.com/qubvel/segmentation_models.pytorch

I wanted highest-acccuracy, rather than smallest or fastest

  1. Objects held by Person?

In the final output mask, how can I even get the objects that a person is holding, say a cup, a purse, a tennis racquet, a balloon, a toy, a magazine. It could be just about anything.

I am very much perplexed with this problem.

For training of human segmentation, I was planning to use the Supervisely Person dataset. If I am not mistaken, the Supervisely dataset doesn't contain masks for objects that the person might be holding. To achieve this, would a dataset like Supervisely be unfit for the job? Or we need to train on a dataset with more labels than just "person"?

But ideally, if an object is lying on the side, it is ok if it does not come in the mask. But if the person is holding the object, it should definitely come in the final mask.

How can this be achieved?

Thanks!

@anilsathyan7
Copy link
Owner

It would be a good idea to start with deeplab model for full person.Try out the sample 21 class model trained with pascal_coco in tensorflow webiste which already contains the person class.

If you want to train on your own data, use a high resolution 513x513 input and depth multiplier 1(or more) for highest accuracy. This may increase the overall inference time; but since you are striving for highest-accuracy you need to make some trade-offs for speed.

Supervisely dataset does not have proper masks for connected objects(to person). Try removing such data from supervisely and use them in combination with pascal/coco person datasets. Sometimes it seems to be ambiguous regarding inclusion of connected objects(i. is it a connected object or an object at back-side partially occluded by person?). In any case, you need to ensure you have sufficient number of images (wit/without connected) for training, as per your specific use-case. I have not tried any other techniques for the including the connected objects.

@InternetMaster1
Copy link
Author

Many thanks for the detailed answer,

  1. Deeplab Model
    Do you mean the DeeplabV3+ variant? And what about the model, say mobilenetv2, mobilenetv3, resnet50, portraitnet, etc. Are you aware of any chart of a comparison of the accuracy/speed of all these models?

  2. Thanks for the helpful tip about input size.

  3. Wow. That really sheds a lot of light into how to handle connected-objects. Things are far more clearer now.

I had a few questions :

A) What is the license of your amazing library?
B) If you were to recommend a model from your library, which would you say be the most suited to my task? You have tried out a lot many combinations.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants