A quick attempt to utilize Stable Diffusion (SD) image generative models (v1.5 & XL) and MediaPipe's Pose Landmarker vision models to generate 3D human poses from an AI prompt.
- 2024-11-12: Added a Godot full-body IK pose solver.
- 2024-10-24: Initial implementation.
- Clone repo:
git clone https://github.com/jerenchen/simple-diffusion-pose-gen.git
. - Change dir into
python
and (optionally) use a virtual enivronment (e.g. conda). - Install Python depedencies:
pip install -r requirements.txt
. - Download MediaPipe Pose Landmarker (Full) and save the file under
python/tasks
.
- Inside dir
python
, Initialize the Python pose-gen service:python posegen.py --base sd15 --steps 8
. - Open and run
projec.godot
inside dirgodot
with Godot Engine. - Enter a prompot to generate a pose.
Generating a pose using prompt "A basketball player making a 3-pointer jump shot" |
NOTE: The standalone demo requires PySide6 >= v6.7.
Inside dir python
, run: python simple-diffusion-pose-gen.py
, and then enter a prompt to generate a pose.
- SD15 has fewer model parameters and therefore requires less memory whereas SDXL could generate images with higher fidelity.
- Hyper-SD (a SD inference acceleration technique) Steps: 2, 4, or 8, trade-off between speed (fewer bigger steps) and quality (more smaller steps).
- PyTorch device for running SD inference (if available): CPU, CUDA, or MPS (Apple Silicon).