Table of Contents
We construct a new Large Vision-Language Model Knowledge Editing Benchmark, VLKEB, and extend the Portability metric for more comprehensive evaluation. Leveraging a multi-modal knowledge graph, our image data are bound with knowledge entities. This can be further used to extract entity-related knowledge, which constitutes the base of editing data.
Dataset is available at Kaggle. You can download it from site or use kaggle api:
kaggle datasets download -d hymanh/vlkeb-data
We also provide a Hugging Face dataset as an alternative.
The dataset is organized as follows:
├── VLKEB/
│ ├── VLKEB_images/ # image folder
│ │ ├── m.0104lr/ # image subfolder, entity ID
│ │ │ ├── google_15.jpg # image file
│ │ │ ├── ...
│ │ ├── ...
│ │
│ ├── train.json # Train file
│ ├── eval.json # Evaluation file, without portability test
│ ├── eval_multihop.json # Evaluation file, containing multi-hop portability
│ ├── eval_edit_onehop.json # Evaluation file, edit one-hop knowledge for portability
│ │
│ └── LICENSE.txt # License file
VLKEB includes a total of 8174 edits, divided into 5000 for training and 3174 for evaluation. There are 18434 images used in the Reliability, Generality, and Locality tests. The Portability test utilizes the same images as the Reliability test and comprises a total of 4819 cases. These cases are distributed among 1-hop, 2-hop, 3-hop, and 4-hop categories, with 1278, 1238, 1193, and 1110 cases, respectively.
All (train/eval) | Rel. | Gen. | Loc. | ||
---|---|---|---|---|---|
#Edits | 8174 (5000/3174) | #Images | 8172 | 6627 | 3635 |
All (eval only) | 1-hop | 2-hop | 3-hop | 4-hop | |
#Port. | 4819 | 1278 | 1238 | 1193 | 1110 |
Conda environment: we export the conda environment file for running the code. We conduct experiments based on the great works in Acknowledgments.
# To run code of EasyEdit, use the following environment
conda env create -f envs/vlkeb_easyedit.yml
# To run code of KE, use the following environment
conda env create -f envs/vlkeb_ke.yml
We provide pre-trained models for SERAC, MEND and KE in the paper.
The weights can be downloaded from Hugging Face or use the following command.
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/HymanH/VLKEB-models
To run the code, we also need to download the pre-trained pytorch models of LVLMs and others, then put them in proper directories.
Here we put under 'hugging_cache' folder and 'openai' folder:
# models in hugging_cache folder
hugging_cache/
├── all-MiniLM-L6-v2/
├── bert-base-uncased/
├── distilbert-base-cased/
├── Llama-2-7b-hf/
├── llava-v1.5-7b/
├── mPLUG-Owl2/
├── opt-2.7b/
├── opt-125m/
├── Qwen-7B/
├── Qwen-VL/
├── vicuna-7b/
├── vicuna-7b-v1.5/
│
├── blip2_pretrained_flant5xxl.pth
├── blip2_pretrained_opt2.7b.pth
├── eva_vit_g.pth
└── pretrained_minigpt4_7b.pth
# clip-vit model in openai folder
openai/
└── clip-vit-large-patch14-336/
Links are in the following:
Currently, we put code of different experiments in different branches.
BLIP2-OPT, MiniGPT-4 and LLaVA
For the single editing experiment, you can refer to the main branch. For the multihop and sequential editing experiment, you can refer to the multihop_and_sequential branch. For the edit one-hop knowledge, you can refer to the edit_onehop branch.
For experiments of KE method, you can refer to the main branch and get into 'KE' subfolder.
The parameters are all in hparams folder, and detailed setting can be found in EasyEdit. Path to models and data should be properly set in config files.
To run the code, check the python file under root folder and run as the following:
# at main branch
python multimodal_edit.py [FUNC_NAME] [HOP_NUM] # see .py file for function names
# at main branch, KE, can use bash scripts
./train_ke.sh [GPU_ID] [MODEL_NAME] # MODEL_NAME=[blip2, minigpt4, llava]
./test_ke.sh [GPU_ID] [MODEL_NAME] [CHECKPOINT_PATH] # test without portability
./test_multihop.sh [GPU_ID] [MODEL_NAME] [HOP_NUM] # HOP_NUM=[1, 2, 3, 4]
# at multihop_and_sequential branch
python test_base_portability.py [FUNC_NAME] [HOP_NUM] # test portability on unedited models
python test_multihop_portability.py [FUNC_NAME] [HOP_NUM]
python test_sequential_editing.py [FUNC_NAME] # hop num is 1
# at edit_onehop branch
python test_edit_onehop.py [FUNC_NAME]
If you find our project or dataset helpful to your research, please consider citing:
@misc{huang2024vlkeb,
title={VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark},
author={Han Huang and Haitian Zhong and Tao Yu and Qiang Liu and Shu Wu and Liang Wang and Tieniu Tan},
year={2024},
eprint={2403.07350},
archivePrefix={arXiv}
}
Github (seen by all contributors) - New Issue
Han Huang - [email protected]
Haitian Zhong - [email protected]
We would like to thank the following projects and their great works for making this project possible: MMKG, EasyEdit, KnowledgeEditor, LAVIS (BLIP2), MiniGPT-4, LLaVA, Qwen-VL, mPLUG-Owl2.
We would also like to extend our gratitude to all the other projects and contributors in the open-source community whose work may not be directly listed here but has nonetheless been invaluable. Your innovations, tools, and libraries have greatly contributed to our project. We are immensely grateful for your work!