Skip to content

Latest commit

 

History

History
14 lines (7 loc) · 1.41 KB

README.md

File metadata and controls

14 lines (7 loc) · 1.41 KB

AIR-Embodied

Code is coming soon !

1726156748245

Recent advancements in 3D reconstruction and neural rendering have enhanced the creation of high-quality digital assets, yet existing methods struggle to generalize across varying object shapes, textures, and occlusions. While Next Best View (NBV) planning and learning-based approaches offer solutions, they are often limited by predefined criteria and fail to manage occlusions effectively. We present AIR-Embodied, a novel framework that integrates embodied AI agents with large-scale pretrained multi-modal language models (MLLM) to improve active reconstruction. AIR-Embodied utilizes a three-stage process: understanding the current reconstruction state via multi-modal prompts, planning tasks with viewpoint selection and interactive actions, and employing closed-loop reasoning to ensure accurate execution. The agent dynamically refines its actions based on discrepancies between the planned and actual outcomes. Experimental evaluations across virtual and real-world environments demonstrate that AIR-Embodied significantly enhances reconstruction efficiency and quality, providing a robust solution to challenges in active 3D reconstruction.

1726156657768