Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BlazePose documentation of z coordinates #181

Open
shiffman opened this issue Oct 7, 2024 · 10 comments
Open

BlazePose documentation of z coordinates #181

shiffman opened this issue Oct 7, 2024 · 10 comments
Labels
documentation Improvements or additions to documentation

Comments

@shiffman
Copy link
Member

shiffman commented Oct 7, 2024

Hi everyone! I'm working on a video tutorial about BodyPose in ml5.js 1.0. I discovered in the making of the video some improvements I think we could make to the documentation of how the 3D coordinates work with BlazePose.

In the tfjs-models documentation the units for keypoints3D are explained as follows:

For the keypoints3D, x, y and z represent absolute distance in meters in a 2 x 2 x 2 meter cubic space. The range for each axis goes from -1 to 1 (therefore 2m total delta). The z is always perpendicular to the xy plane that passes the center of the hip, so the coordinate for the hip center is (0, 0, 0).

We should probably include a simplified version of this in our documentation here:

Screenshot 2024-10-07 at 4 35 08 PM

I was also confused to find that the 2D keypoints array (as described in our docs) also includes a z value. I don't believe this is part of the original BlazePose data. It looks like this is the code where it is being added but the units appear to be different. @ziyuan-linn, do you know offhand what is happening here? Is there some code I'm missing which is trying to change the real world "meters" range to pixel units?

This is what I see in the console:

keypoints:
Screenshot 2024-10-07 at 4 38 26 PM

keypoints3D:
Screenshot 2024-10-07 at 4 38 33 PM

And now under the nose property:
Screenshot 2024-10-07 at 4 40 26 PM

(Interesting to note that the confidence score is different for keypoints3D!)

@ziyuan-linn
Copy link
Member

I just ran BlazePose and here is a raw unprocessed output.

[
    {
        "score": 0.995323121547699,
        "keypoints": [
            {
                "x": 245.63711038340324,
                "y": 294.79695594356946,
                "z": -689535.245693554,
                "score": 0.9987707333798664,
                "name": "nose"
            },
            {
                "x": 270.6604500325918,
                "y": 252.8618703269841,
                "z": -639394.1690627289,
                "score": 0.9983254733819559,
                "name": "left_eye_inner"
            },
            // ...
        ],
        "keypoints3D": [
            {
                "x": 0.01237955486137293,
                "y": -0.587451014244643,
                "z": -0.2591571422454245,
                "score": 0.9982515152476616,
                "name": "nose"
            },
            {
                "x": 0.026530376499100554,
                "y": -0.6236724986839615,
                "z": -0.24141936558530744,
                "score": 0.9973494677709942,
                "name": "left_eye_inner"
            },
            // ...
        ]
    }
]

Looks like the z values are present for the 2d keypoints. I have no idea what it represents, and it also does not seem to be in the documentation. If everyone agrees, I think we can just take that value out.

I think the named keypoints just copy the x, y, z, and confidence values from the keypoints array. Should we also add the 3d values to the named keypoints?

@shiffman
Copy link
Member Author

shiffman commented Oct 7, 2024

Thank you for looking into this @ziyuan-linn! Let's do the following:

  • Remove the z value in the keypoints array.

I was about to say let's add only the z value from keypoints3D to the named values, but it might make sense for us to provide the full xyz, what about the following:

nose: {
  x: 332.6024622758805,
  y: 265.78330263473146,
  confidence: 0.9993924452777454,
  keypoint3D: {
    x: 0.05988978072436527,
    y: -0.5489126977664187,
    z: -0.26418375968933105
  }
}

Or is this overdoing it and making it super complicated? @MOQN I'd be curious for your thoughts?

On another note, I'm trying to figure out why the tfjs docs list 4 extra points for BlazePose that don't actually show up in the model. @ziyuan-linn have you run across this in your research at all?

34: forehead
35: leftThumb
36: leftHand
37: rightThumb
38: rightHand \

@alanvww alanvww added the documentation Improvements or additions to documentation label Oct 8, 2024
@ziyuan-linn
Copy link
Member

@shiffman I also have no idea what those keypoints are. The tfjs documentation can sometimes be puzzling. The BlazePose model is trained with 33 keypoints as the Google MediaPipe Model Card suggests. I think it should be safe to ignore them.

I will reply with my thoughts about the API in the ml5-next-gen thread.

@ziyuan-linn
Copy link
Member

Actually just looking at the model card I think the 2d z value is supposed to be:

Z coordinate is measured in "image pixels" like the X and Y screen coordinates and represents the distance relative to the plane of the subject's hips, which is the origin of the Z axis. Negative values are between the hips and the camera; positive values are behind the hips. Z coordinate scale is similar with X, Y scales but has dierent nature as obtained not via human annotation, by ing synthetic data (GHUM model) to the 2D annotation. Note, that Z is not metric but up to scale.

However, a value like -437589 is nowhere near accurate. I think removing it for now might be the best choice.

@MOQN
Copy link
Member

MOQN commented Oct 10, 2024

Thank you for all of these findings and thoughtful discussion. I completely agree with removing the Z value!

@MOQN
Copy link
Member

MOQN commented Oct 10, 2024

nose: {
  x: 332.6024622758805,
  y: 265.78330263473146,
  confidence: 0.9993924452777454,
  keypoint3D: {
    x: 0.05988978072436527,
    y: -0.5489126977664187,
    z: -0.26418375968933105
  }
}

Or is this overdoing it and making it super complicated? @MOQN I'd be curious for your thoughts?

@shiffman, I believe it's a great suggestion. Very intuitive! (Edit: I wrote it too quicky suggesting 3d and didn't realize the key begins with a number, haha.) Alternatively we could use pos3D, position, position3D, coords, coords3D or depth instead of keypoint3D.

nose: {
  x: 332.6024622758805,
  y: 265.78330263473146,
  confidence: 0.9993924452777454,
  position: {
    x: 0.05988978072436527,
    y: -0.5489126977664187,
    z: -0.26418375968933105
  }
}

or

nose: {
  x: 332.6024622758805,
  y: 265.78330263473146,
  confidence: 0.9993924452777454,
  x3D: 0.05988978072436527,
  y3D: -0.5489126977664187,
  z3D: -0.26418375968933105
}

keypoints and keypoints3D can be used only for the array names to get the entire position data.

keypoints: [{ x, y, confidence, name }, ...],
keypoints3D: [{ x, y, z, confidence, name }, ...],

[
  {
    box: { width, height, xMax, xMin, yMax, yMin },
    id: 1,
    keypoints: [{ x, y, confidence, name }, ...],
    keypoints3D: [{ x, y, z, confidence, name }, ...],
    left_ankle: { x, y, z, confidence },
    ...
    confidence: 0.28,
  },
  ...
];

@shiffman
Copy link
Member Author

These are great suggestions! I'd love to hear everyone's feedback during the meeting today!

@shiffman
Copy link
Member Author

Hello web team! Just noting this has now been incorporated so we can update the documentation! See ml5js/ml5-next-gen#215

@alanvww
Copy link
Member

alanvww commented Oct 12, 2024

@shiffman @MOQN @ziyuan-linn Hi team, @leey611 and I are currently working on the documentations to address this issue. While the named keypoints update is functioning well, we’ve discovered that the keypoints array is still including "wrong" z values. We haven’t addressed it in the [email protected] update, have we?

@shiffman
Copy link
Member Author

Ah i just checked and you are right! Let me make a quick fix for this and we can do a 1.1.1 release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants