Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a Neural Network for hand recognition & tracking #2

Open
snavas opened this issue Jul 16, 2020 · 11 comments
Open

Use a Neural Network for hand recognition & tracking #2

snavas opened this issue Jul 16, 2020 · 11 comments
Labels
enhancement New feature or request

Comments

@snavas
Copy link
Owner

snavas commented Jul 16, 2020

Maybe to try a lightweight neural network (like YOLO?) with the colorimage, depthimage & fingertip features. Use current approach to generate training data.

@snavas snavas added the enhancement New feature or request label Jul 16, 2020
@PaulaScharf
Copy link
Collaborator

PaulaScharf commented Feb 24, 2021

One easy out-of-the-box solution for feature detection of hands is this.
However it doesnt detect hands in gloves.

An implementation of this can be found in the branch issue/neuralnet

Important Sidenote:
Mediapipe can detect hands with a wide variety of skin tones. (problematic are tattooes though)
annotated_image8

@PaulaScharf
Copy link
Collaborator

PaulaScharf commented Mar 4, 2021

I managed to train a the deeplabv3 model from pytorch to do semantic hand segmentation using a tutorial . But the result is much slower than hoped for :/

Here is one gif of the detection at full image resolution:
test

Here the resolution was reduced by 30%:

test2

Edit:
This was done on the GPU not CPU. So thats not the cause of the performance issue.

Edit:
I also tried cropping the image to the size of one hand, which increases performance a little bit (0.8 sec per frame instead of 1.2) but it does not scale well if there are several hands detected (n*0.8 sec per frame)

@PaulaScharf
Copy link
Collaborator

I am currently trying this tutorial wich has a well documented repository on github. It provides several different models as backbones, including very lightweight models like MobileNet. So I am hoping for a good inference time.

@PaulaScharf
Copy link
Collaborator

PaulaScharf commented Mar 29, 2021

Here is the result using FCN32 and MobilNet from the previously mentioned repository.

test3

It is much faster but also very inacurate. The inacuracy might be due to the fact that I only used one third of the training data this time. I will try to train with more data and maybe switch out the models.

Update:
Here is FCN8 and Mobilenet with more training data (25 epochs):

result

And here Segnet and Mobilenet (5 epochs):

test5

Update2:
I think no major improvements in accuracy while maintaining the speed can be expected now. Atleast not with the available time and knowledge I have about the topic. So in conclusion I think semantic segmentation is not a viable option for this project.

@PaulaScharf
Copy link
Collaborator

PaulaScharf commented Mar 31, 2021

Currently depth and optical flow are not used for the segmentation. However the depth values from the camera are very inacurate, so there is not much that can be done with them.
Usage of optical flow to segment hands and maybe even recognise gestures should be investigated.

@PaulaScharf
Copy link
Collaborator

PaulaScharf commented Apr 2, 2021

Imo hand feature detection (eg with mediapipe) is much more feasible in this project. It has the downside of having animate the detection.

@PaulaScharf
Copy link
Collaborator

PaulaScharf commented Apr 20, 2021

Opencv provides a builtin class for background removal:
link

We should try it out.

Edit:
Here is a quick implementation:
background
So this obviously doesnt work on its own. But I still think background removal should be investigated. If I try it out in powerpoint it looks like this:
Screenshot 2021-04-20 225727

Maybe this can be done with GrabCut.

@PaulaScharf
Copy link
Collaborator

Here is an attempt at grabcut:
grabcut

This is suboptimal in several ways, namely the speed, the occasional confusion between the foreground and background and the general inaccuracy.

@PaulaScharf
Copy link
Collaborator

I tried to use the previously mentioned mediapipe in combination with the watershed algorithm to get a simple visualization of the hands. The results are not good, but maybe a start.
mediapipe

@PaulaScharf
Copy link
Collaborator

PaulaScharf commented Jun 3, 2021

Attempt number 2 at mediapipe. I am using it for colorcalibration now. I think it has potential.
mediapipe_color
Currently I set the detection confidence very high (only few but accurate detections), use all the detected hand feature points from mediapipe to get the hand color (I average the color of every hand feature point) and then segment the entire image for this hand color. An alternative approach would be to lower the detection confidence (-> many but at times inaccurate detections) and only segment the detected hand areas with the hand color for that area.

Edit:
This is how the alternative approach looks.
mediapipe_color2
I actually think it looks really nice so far :)
Currently the hand color is between mean - (2*std) and mean - (2*std). But there is probably a better way to remove outliers from the detections than using standard deviation. I will have to look into that.

Edit:
Here is after a bit of calibration.
mediapipe_color4

@PaulaScharf
Copy link
Collaborator

PaulaScharf commented Jun 18, 2021

Good news. Mediapipe also works well enough for darker skin colors :)
dark7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants