Visual Grounding for Object-Level Generalization in Reinforcement Learning

Code for paper "Visual Grounding for Object-Level Generalization in Reinforcement Learning" accepted by ECCV 2024 [PDF].

Overview of our proposed CLIP-guided Object-grounded Policy Learning (COPL). (left) Visual grounding: The instruction (e.g. "hunt a cow") is converted into a unified 2D confidence map of target object (e.g. cow) via our modified MineCLIP. (right) Transfering VLM knowledge into RL: The agent takes the confidence map as the task representation and is trained with our proposed focal reward derived from the confidence map to guide the agent toward the target object.

Installation

Create a conda environment with python 3.9 and install Python packages in requirements.txt.
Install jdk 1.8.0_171. Then install our modified MineDojo environment.
Download pre-trained models for MineCraft: run bash downloads.sh to download the MineCLIP model.

Training

To train single-task RL for hunting a sheep with focal reward:

Run ./scripts/sheep_focal.sh 0, where 0 is the random seed. --multi_task_config in the script specifies the task and it can be changed to other config files in src/config/env/single_task to train RL for other tasks, such as hunting a cow and hunting a pig.

To train COPL for hunting domain:

Run ./scripts/hunt_copl.sh 0, where 0 is the random seed. --multi_task_config in the script specifies the task domain and it can be changed to src/config/env/multi_tasks/harvest.json to train COPL for harvest domain.

Demos

Here we present some videos of agents performing hunting tasks trained using COPL, as well as confidence maps for different objects given by our modified MineCLIP. From left to right: raw video, confidence maps for cow, sheep, and pig, respectively.

hunt a cow
hunt a sheep
hunt a pig

Citation

If you find our work useful in your research and would like to cite our project, please use the following citation:

@inproceedings{jiang2024visual,
      title={Visual Grounding for Object-Level Generalization in Reinforcement Learning}, 
      author={Jiang, Haobin and Lu, Zongqing},
      booktitle={European Conference on Computer Vision (ECCV)},
      year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download.sh		download.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Grounding for Object-Level Generalization in Reinforcement Learning

Installation

Training

Demos

Citation

About

Releases

Packages

Languages

License

PKU-RL/COPL

Folders and files

Latest commit

History

Repository files navigation

Visual Grounding for Object-Level Generalization in Reinforcement Learning

Installation

Training

Demos

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages