Reducing Latency for Hand-Tracking Solution in Python #5789

MarcKr3 · 2024-12-20T19:57:18Z

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Windows 10 amd64x

MediaPipe Tasks SDK version

Mediapipe Version: 0.10.20

Task name (e.g. Image classification, Gesture recognition etc.)

Hand landmark detection

Programming Language and version (e.g. C++, Python, Java)

Python

Describe the actual behavior

Using the LITE Model (model_complexity=0), I'm measuring Latency of 35-27ms

Describe the expected behaviour

According to "https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker", the FULL Model (model_complexity=1) has a latency of CPU:17ms, GPU:12ms

Standalone code/steps you may have used to try to get what you need

The way I measure latency is as follows:

def log_latency(start_time, event):
    elapsed_time = time.time() - start_time
    print(f"[{event}] Elapsed Time: {elapsed_time:.4f} seconds")
    return elapsed_time
    
with mp_hands.Hands(model_complexity=0, min_detection_confidence=0.3, min_tracking_confidence=0.5) as hands: 
    while cap.isOpened():
        ret, frame = cap.read()
        
        # BGR 2 RGB
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        
        # Flip on horizontal
        image = cv2.flip(image, 1)
        
        # Set flag
        image.flags.writeable = False

        # Detections
        det_time = time.time()
        results = hands.process(image)
        log_latency(det_time, "Landmark Detection")

        # Set flag to true
        image.flags.writeable = True
        
        # RGB 2 BGR
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

        # Detections
        'print(results)'
        
        # Rendering results
        if results.multi_hand_landmarks:
            for num, hand in enumerate(results.multi_hand_landmarks):
                mp_drawing.draw_landmarks(image, hand, mp_hands.HAND_CONNECTIONS, 
                                        mp_drawing.DrawingSpec(color=(51, 102, 0), thickness=1, circle_radius=2),
                                        mp_drawing.DrawingSpec(color=(33, 165, 205), thickness=2, circle_radius=0),
                                         )
                    
            # Draw Finger distances to image from point list
            draw_tip_distances(image, results, point_list)
            
            # Draw Hand distances to image from tip list
            draw_hand_distances(num, image, results, tip_list)

        # Check for countdown trigger
        key = cv2.waitKey(10) & 0xFF
        if key == ord('c'):
            cal_distances(image, hands)

        # Show the image
        cv2.imshow('Hand Tracking', image)

        if key == ord('q'):
            break

cap.release()
cv2.destroyAllWindows()

Other info / Complete Logs

I'm trying to get the Hand Tracking as close to real-time as possible.
I apologize if I'm misinterpreting the "expected" latency values.

I'm considering:
*Hardware acceleration: Unfortunately, I don't have a CUDA GPU. Isn't supported in the Python solution as far as I know anyways.
*Playing with detection & tracking confidence yielded improvements of ~2ms
*Tracking only necessary landmarks: For my task I only need wrist & Fingertip landmarks, however the model tracks all 21 landmarks. Would creating a custom model like this be possible/reduce latency?
*I considered switching to C++, but had problems setting up the MediaPipe Framework. I would get the Hello World to run successfully, but the hand_tracking_cpu example failed to build....

I'll gladly specify further if necessary! Thanks!

The text was updated successfully, but these errors were encountered:

kuaashish · 2024-12-23T06:43:42Z

Hi @MarcKr3,

Thank you for reaching out. Below are some answers regarding the support you are looking for:

Hardware Acceleration:

Based on the code snippet and version you are using, it appears you are utilizing the legacy hand tracking solution. Please
note that this solution has been upgraded and is now integrated into the Hand Landmarker Task API, as outlined in the
documentation. We have discontinued support for
legacy solutions and are no longer maintaining them.

The upgraded Task API offers improved performance and is easier to implement for your use case. It also supports GPU
acceleration. For GPU configuration details, please refer to this #5426. We recommend transitioning to the new Hand
Task API, which you can explore through the guidelines here. A Python guide is available here, along with a Python notebook example. If you encounter similar behavior in the new Task API, please let us know.

Tracking Specific Landmarks:

Currently, it is not possible to track only specific landmarks with the Hand Landmarker. However, MediaPipe's model maker
can assist with this feature, though it is not yet available for the Hand Landmarker.

Switching to C++:

If you wish to switch to C++, we recommend opening a new issue using this template and providing the necessary details and complete error logs. Our team will be happy to assist you in resolving this.

Please let us know if you need further assistance.

MarcKr3 · 2024-12-25T01:52:29Z

Thank You for Your answer!

Sorry, I thought I'm running the latest version because I was working with the most recent pip install.
(pip show mediapipe -> mediapipe Version: 0.10.20)

The python notebook for landmark detection in images worked, but i was unable to implement the "live stream"-task.
I tried my hardest following the guides You shared, but failed horribly...

import mediapipe as mp
from mediapipe import solutions
from mediapipe.framework.formats import landmark_pb2
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import cv2
import numpy as np

BaseOptions = python.BaseOptions
HandLandmarker = vision.HandLandmarker
HandLandmarkerOptions = vision.HandLandmarkerOptions
HandLandmarkerResult = vision.HandLandmarkerResult
VisionRunningMode = vision.RunningMode

# Create a hand landmarker instance with the live stream mode:
def print_result(result: HandLandmarkerResult, output_image: mp.Image, timestamp_ms: int):  # type: ignore
    print('hand landmarker result: {}'.format(result))

options = HandLandmarkerOptions(base_options=BaseOptions(model_asset_path='D:\Desktop\MIDIHands\hand_landmarker.task'), running_mode=VisionRunningMode.LIVE_STREAM, result_callback=print_result)

cap = cv2.VideoCapture(0)

MARGIN = 10  # pixels
FONT_SIZE = 1
FONT_THICKNESS = 1
HANDEDNESS_TEXT_COLOR = (88, 205, 54) # vibrant green

def draw_landmarks_on_image(rgb_image, results):
  hand_landmarks_list = results.hand_landmarks
  handedness_list = results.handedness
  annotated_image = np.copy(rgb_image)

  # Loop through the detected hands to visualize.
  for idx in range(len(hand_landmarks_list)):
    hand_landmarks = hand_landmarks_list[idx]
    handedness = handedness_list[idx]

    # Draw the hand landmarks.
    hand_landmarks_proto = landmark_pb2.NormalizedLandmarkList()
    hand_landmarks_proto.landmark.extend([landmark_pb2.NormalizedLandmark(x=landmark.x, y=landmark.y, z=landmark.z) for landmark in hand_landmarks])
    
    solutions.drawing_utils.draw_landmarks(annotated_image, hand_landmarks_proto, solutions.hands.HAND_CONNECTIONS, solutions.drawing_styles.get_default_hand_landmarks_style(), solutions.drawing_styles.get_default_hand_connections_style())

    # Get the top left corner of the detected hand's bounding box.
    height, width, _ = annotated_image.shape
    x_coordinates = [landmark.x for landmark in hand_landmarks]
    y_coordinates = [landmark.y for landmark in hand_landmarks]
    text_x = int(min(x_coordinates) * width)
    text_y = int(min(y_coordinates) * height) - MARGIN

    # Draw handedness (left or right hand) on the image.
    cv2.putText(annotated_image, f"{handedness[0].category_name}", (text_x, text_y), cv2.FONT_HERSHEY_DUPLEX, FONT_SIZE, HANDEDNESS_TEXT_COLOR, FONT_THICKNESS, cv2.LINE_AA)

  return annotated_image

with HandLandmarker.create_from_options(options) as landmarker:
    while cap.isOpened():
        ret, frame = cap.read()
        frame_timestamp_ms = cap.get(cv2.CAP_PROP_POS_MSEC)

        image = cv2.flip(frame, 1)

        image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image)
        
        results = landmarker.detect_async(image,int(frame_timestamp_ms))
        
        annotated_image = draw_landmarks_on_image(image, results)

        key = cv2.waitKey(10) & 0xFF

        cv2.imshow("MIDIHands", annotated_image)

        if key == ord('q'):
            break

cap.release()
cv2.destroyAllWindows()

This yields the error:

hand landmarker result: HandLandmarkerResult(handedness=[], hand_landmarks=[], hand_world_landmarks=[])
Traceback (most recent call last):
  File "d:\Desktop\MIDIHands\newAPI.py", line 67, in <module>
    annotated_image = draw_landmarks_on_image(image, results)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\Desktop\MIDIHands\newAPI.py", line 29, in draw_landmarks_on_image
    hand_landmarks_list = results.hand_landmarks
                          ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'hand_landmarks'

Could You point towards an implementation of live stream-hand landmark detection?

Regarding the Hardware Acceleration:
I'm running Windows, which it isn't supported on yet, right?

Thanks for the Information about the Model Maker and C++.
I'll consider it if updating to the task APIs doesn't work.

google-ml-butler bot assigned kuaashish Dec 20, 2024

kuaashish added the stat:awaiting response Waiting for user response label Dec 23, 2024

google-ml-butler bot removed the stat:awaiting response Waiting for user response label Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing Latency for Hand-Tracking Solution in Python #5789

Reducing Latency for Hand-Tracking Solution in Python #5789

MarcKr3 commented Dec 20, 2024 •

edited

Loading

kuaashish commented Dec 23, 2024

MarcKr3 commented Dec 25, 2024

Reducing Latency for Hand-Tracking Solution in Python #5789

Reducing Latency for Hand-Tracking Solution in Python #5789

Comments

MarcKr3 commented Dec 20, 2024 • edited Loading

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

kuaashish commented Dec 23, 2024

MarcKr3 commented Dec 25, 2024

MarcKr3 commented Dec 20, 2024 •

edited

Loading