Skip to content

taco-group/PANDORA

Repository files navigation

PANDORA

Diffusion Policy Learning for Dexterous Robotic Piano Playing


🧠 Abstract

PANDORA is a novel diffusion-based policy learning framework for dexterous robotic piano performance.
It leverages a conditional U-Net with FiLM conditioning to iteratively denoise noisy action sequences into smooth, high-dimensional trajectories.

To enhance expressiveness and musical fidelity, we introduce a composite reward function that integrates task-specific objectives with high-level feedback from a Large Language Model (LLM) oracle.
This oracle assesses performance style and semantic correctness, enabling dynamic, hand-specific reward adjustment.

Combined with residual inverse-kinematics refinement, PANDORA achieves state-of-the-art performance in the ROBOPIANIST environment, significantly outperforming baseline methods.


🎹 Pipeline Overview

Overview of PANDORA’s diffusion-based action generation pipeline and LLM-driven reward evaluation.


📖 Citation

If you find this work useful, please consider citing us:

@misc{huang2025pandoradiffusionpolicylearning,
      title={PANDORA: Diffusion Policy Learning for Dexterous Robotic Piano Playing}, 
      author={Yanjia Huang and Renjie Li and Zhengzhong Tu},
      year={2025},
      eprint={2503.14545},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.14545}, 
}