ChatCoach is a fitness correction system based on pose estimation and large language models (LLMs). The primary goal is to provide fitness action guidance for individuals who are new to fitness.
The main functionality of the system is to provide relevant fitness action guidance based on video files input by users. Our system covers 8 common fitness actions, which are:
- Bicep curl
- Front raise
- Plank
- Barbell squat
- Dumbbell lateral raise
- Shoulder push
- Barbell bench press
- Bent-over row
When a user inputs a video of performing a complete action of these fitness exercises, our system analyzes the video and combines it with the large model to provide suggestions and feedback for action correction and related textual guidance.
The input is a video file for the visual model, where users can choose between top-down or bottom-up methods. If the top-down method is selected, two models are required; otherwise, one model suffices. The output is JSON-formatted skeletal joint information extracted from the video using pose estimation. For example, the OpenPose model extracts 18 key joints (e.g., nose, shoulders, elbows, etc.). We focus on these 18 joints for analysis and discard any additional keypoints.
In this phase, JSON files from Phase 1 are input into the prompt generator, producing a prompt text. We analyze joint angles related to the fitness action to extract meaningful posture information. For instance, in a barbell squat, we calculate the angle between the thigh and the ground to assess the correctness of the movement. The prompt generator uses predefined templates to create prompts based on extracted angle information, incorporating expert knowledge on correct angle ranges for improved analysis.
Here is an example of a prompt generated by the prompt generator for the bent-over row action:
"This is a bent-over row action. Below, I will provide some key skeletal angle information about this action from a side view. Please analyze whether this person's action is standard. If not, please provide suggestions. Here is some information: During the bent-over row, at the highest point, the elbow is above the torso (this is considered standard, indicating that the back muscles are adequately engaged). The angle between the thigh and the calf ranges from 121 degrees to 132 degrees (generally, 120 degrees to 160 degrees is considered standard; otherwise, too high or too low may hinder force exertion and potentially cause injury). The angle between the torso and the ground ranges from 115 degrees to 130 degrees."
The input is the prompt from Phase 2. We use the Kimi model API from Moonshot to obtain relevant suggestions, which are displayed verbatim on the front end to provide user feedback.
For the input of this system, the video should ideally include the full body of the individual performing the exercises. The table below outlines the requirements for the shooting angles of the videos for the eight fitness actions.
Action Type | Static/Dynamic | Shooting Angle |
---|---|---|
Bicep Curl | Dynamic | Side |
Front Raise | Dynamic | Side |
Plank | Static | Side |
Barbell Squat | Dynamic | Side |
Dumbbell Lateral Raise (Fly) | Dynamic | Front |
Seated Shoulder Press | Dynamic | Front (shot from back to front) |
Barbell Bench Press | Dynamic | Front (shot from head to feet) |
Bent-over Row | Dynamic | Side |
-
OS: Windows11
-
IDE: PyCharm
-
need: pyqt5 , numpy , math
All models used for pose estimation are included in this repository.
MainWindows.py
is the main code for running this system. Please configure the appropriate PyQt5 library in PyCharm to run the frontend interface.
Simply run MainWindows.py
to start the application.
We have placed some input videos in the input_data
folder, and some example output videos for reference in the result_videos_for_show
folder.