Multi-Level Compositional Reasoning for Interactive Instruction Following
Suvaansh Bhambri* ,
Byeonghwi Kim* ,
Jonghyun Choi
AAAI 2023
MCR-Agent (Multi-Level Compositional Reasoning Agent) is a multi-level compositional approach that learns to navigate and manipulate objects in a divide-and-conquer manner for the diverse nature of the entailing task.
MCR-Agent addresses long-horizon instruction following tasks based on egocentric RGB observations and natural language instructions on the ALFRED benchmark.
Download the ResNet-18 features and annotation files from the Hugging Face repo.
git clone https://huggingface.co/datasets/byeonghwikim/abp_dataset data/json_feat_2.1.0
We provide zip files that contain raw RGB images (and depth & segmentation masks) in the Hugging Face repository, which takes about 250GB in total. With these images, you can extract features yourself with this code.
To train MCR-Agent, run train.sh
with hyper-parameters below.
Note: As mentioned in the repository of ALFRED, run with --preprocess
only once for preprocessed json files.
First we need to evaluate the individual modules using 'test_unseen.sh' in each module folder.
To evaluate MCR-Agent on ALFRED validation set, input the best model paths in test_unseen.sh
for unseen fold and test_seen.sh
for seen fold
Note: All hyperparameters used for the experiments in the paper are set as default.
This work is partly supported by the NRF grant (No.2022R1A2C4002300), IITP grants (No.2020-0-01361-003, AI Graduate School Program (Yonsei University) 5%, No.2021-0-02068, AI Innovation Hub 5%, 2022-0-00077, 15%, 2022-0-00113, 15%, 2022-0-00959, 15%, 2022-0-00871, 20%, 2022-0-00951, 20%) funded by the Korea government (MSIT).