Skip to content

Commit f5de9c1

Browse files
committed
First commit
1 parent 21d3ba1 commit f5de9c1

40 files changed

+3605
-0
lines changed

.editorconfig

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
root = true
2+
3+
# Unix-style newlines with a newline ending every file
4+
[*]
5+
end_of_line = lf
6+
insert_final_newline = true
7+
trim_trailing_whitespace = true
8+
charset = utf-8
9+
10+
# 4 space indentation
11+
[*.{py,json}]
12+
indent_style = space
13+
indent_size = 4
14+
15+
# 2 space indentation
16+
[*.{md,sh,yaml,yml}]
17+
indent_style = space
18+
indent_size = 2

.gitattributes

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# https://git-scm.com/docs/gitattributes
2+
3+
# Set the default behavior, in case people don't have core.autocrlf set.
4+
# https://git-scm.com/docs/gitattributes#_end_of_line_conversion
5+
* text=auto
6+
7+
# common python attributes, taken from https://github.com/alexkaratarakis/gitattributes/blob/710900479a2bedeec7003d381719521ffbb18bf8/Python.gitattributes
8+
# Source files
9+
# ============
10+
*.pxd text diff=python
11+
*.py text diff=python
12+
*.py3 text diff=python
13+
*.pyw text diff=python
14+
*.pyx text diff=python
15+
*.pyz text diff=python
16+
*.pyi text diff=python
17+
18+
# Binary files
19+
# ============
20+
*.db binary
21+
*.p binary
22+
*.pkl binary
23+
*.pickle binary
24+
*.pyc binary export-ignore
25+
*.pyo binary export-ignore
26+
*.pyd binary
27+
28+
# Jupyter notebook
29+
*.ipynb text eol=lf

.gitignore

+51
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# rtp deploy
2+
/git_hash.txt
3+
/rtp
4+
5+
# local debug on dsw
6+
*.args
7+
*.tmpargs
8+
9+
# xdl config
10+
xdl_config_*.json
11+
12+
#Qwen Dir
13+
!parrot/model/qwen_chat/**
14+
15+
# Python
16+
__pycache__
17+
*.pyc
18+
*.egg-info
19+
dist
20+
21+
# Log
22+
*.log
23+
*.log.*
24+
parrot/VLMEvalKit/logs/*.log
25+
26+
# Data
27+
!**/alpaca-data-conversation.json
28+
29+
# Editor
30+
.idea
31+
*.swp
32+
33+
# Other
34+
.DS_Store
35+
wandb
36+
output
37+
38+
checkpoints
39+
ckpts*
40+
41+
.ipynb_checkpoints
42+
43+
# DevContainer
44+
!.devcontainer/*
45+
46+
# Demo
47+
serve_images/
48+
49+
#
50+
/playground/data/eval
51+
/playground/logs/eval

README.md

+197
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
# 🦜 Parrot: Multilingual Visual Instruction Tuning
2+
3+
<p align="center">
4+
<a href="#-introduction">🎉Introduction</a> •
5+
<a href="#-whats-new">📰What's New</a> •
6+
<a href="#%EF%B8%8F-install">☄️Install</a> •
7+
<a href="#-model">🦜Model</a> •
8+
<a href="#-train">🔥Train</a> •
9+
<a href="#-datasets">🌟Datasets</a> •
10+
<a href="#-mmmb">🎄MMMB</a> <br />
11+
<a href="#-evaluation">🔑Evaluation</a> •
12+
<a href="#-quick-start">📍Quick Start</a> •
13+
<a href="#-acknowledgement">👨‍🏫Acknowledgement</a> •
14+
<a href="#-contact">🤗Contact</a>
15+
</p>
16+
17+
---
18+
19+
<p align="center">
20+
<a href=""><img src="https://img.shields.io/badge/Parrot-v1.0-darkcyan"></a>
21+
<a href='https:/sun-hailong.github.io/projects/Parrot'><img src='https://img.shields.io/Project-Page-Green'></a>
22+
<a href='https://arxiv.org/abs/2406.02539'><img src='https://img.shields.io/badge/Arxiv-2406.02539-b31b1b.svg?logo=arXiv'></a>
23+
<a href=""><img src="https://img.shields.io/github/stars/AIDC-AI/Parrot?color=4fb5ee"></a>
24+
<a href=""><img src="https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FAIDC-AI%2FParrot&count_bg=%23FFA500&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=visitors&edge_flat=false"></a>
25+
</p>
26+
27+
> Thanks to [Hai-Long Sun](https://github.com/sun-hailong) for his contribution in Parrot!
28+
29+
## 🎉 Introduction
30+
Welcome to Parrot [[paper](https://arxiv.org/abs/2406.02539)], a novel method that utilizes textual guidance to drive visual token alignment at the language level. Parrot makes the visual tokens condition on diverse language inputs and uses Mixture-of-Experts (MoE) to promote the alignment of multilingual tokens. Moreover, considering the current lack of benchmarks for evaluating multilingual capabilities within the field, we collect and make available a Massive Multilingual Multimodal Benchmark which includes 6 languages, 15 categories, and 12,000 questions, named as MMMB.
31+
32+
**If you find Parrot useful for your research and applications, please cite using this BibTeX:**
33+
```bibtex
34+
@article{sun2024parrot,
35+
title={Parrot: Multilingual Visual Instruction Tuning},
36+
author={Sun, Hai-Long and Zhou, Da-Wei and Li, Yang and Lu, Shiyin and Yi, Chao and Chen, Qing-Guo and Xu, Zhao and Luo, Weihua and Zhang, Kaifu and Zhan, De-Chuan and others},
37+
journal={arXiv preprint arXiv:2406.02539},
38+
year={2024}
39+
}
40+
```
41+
42+
## 📰 What's New
43+
- [08/02] 🔥 We release the [code](https://github.com/AIDC-AI/Parrot), inhouse multilingual [dataset](https://huggingface.co/datasets/AIDC-AI/Parrot-dataset/tree/main/sharegpt_4v), benchmark [MMMB](https://huggingface.co/datasets/AIDC-AI/Parrot-dataset/tree/main/mmmb), and [model](https://huggingface.co/AIDC-AI/Parrot-7B), welcome to have a try!
44+
- [06/05] 🔥 Parrot is coming! We release the [paper](https://arxiv.org/abs/2406.02539)!
45+
46+
47+
## ☄️ Install
48+
49+
Please follow the instructions below to install the required packages.
50+
51+
1. Clone this repository and navigate to Parrot folder
52+
```bash
53+
git clone https://github.com/AIDC-AI/Parrot.git
54+
cd Parrot
55+
```
56+
57+
2. Install Package
58+
```Shell
59+
conda create -n parrot python=3.10 -y
60+
conda activate parrot
61+
pip install --upgrade pip
62+
pip install -e .
63+
```
64+
65+
### Upgrade to latest code base
66+
67+
```Shell
68+
git pull
69+
pip install -e . --no-deps
70+
```
71+
72+
## 🦜 Model
73+
Parrot is a multilingual multimodal large language model. We provide our fully finetuned models below:
74+
75+
| Model | Base LLM | Vision Encoder | Stage | Download |
76+
| --- | --- | :---: | :---: | :---: |
77+
| Parrot-7B | Qwen-1.5-7B-Chat | CLIP-ViT-Large-patch14-336 | Pretrain | [ckpt](https://huggingface.co/AIDC-AI/Parrot_S1_7B-Qwen15Clip) |
78+
| Parrot-7B | Qwen-1.5-7B-Chat | CLIP-ViT-Large-patch14-336 | SFT | [ckpt](https://huggingface.co/AIDC-AI/Parrot_S2_7B-Qwen15Clip) |
79+
| Parrot-14B | Qwen-1.5-14B-Chat | CLIP-ViT-Large-patch14-336 | Pretrain | [ckpt](https://huggingface.co/AIDC-AI/Parrot_S1_14B-Qwen15Clip) |
80+
| Parrot-14B | Qwen-1.5-14B-Chat | CLIP-ViT-Large-patch14-336 | SFT | [ckpt](https://huggingface.co/AIDC-AI/Parrot_S2_14B-Qwen15Clip) |
81+
82+
<div align="center">
83+
<img src="./images/teaser.png" width="600px" />
84+
</div>
85+
86+
## 🔥 Train
87+
88+
Parrot is trained in two stages: modality alignment and instruction tuning for multilingual alignment. Each stage's training script is provided in the `scripts` folder. Before starting the training, ensure you properly set the `ROOT` variable in the training script. Below are the commands to train Parrot for each stage:
89+
90+
```shell
91+
bash scripts/train/pretrain.sh
92+
bash scripts/train/finetune.sh
93+
```
94+
95+
#### Hyperparameters
96+
We use a similar set of hyperparameters as Vicuna in finetuning. Both hyperparameters used in pretraining and finetuning are provided below.
97+
98+
1. Pretraining
99+
100+
| Model | Global Batch Size | Learning rate | Epochs | Max length | Weight decay |
101+
| --- | ---: | ---: | ---: | ---: | ---: |
102+
| Parrot-7B | 256 | 1e-3 | 1 | 2048 | 0 |
103+
104+
2. Finetuning
105+
106+
| Model | Global Batch Size | Learning rate | Epochs | Max length | Weight decay |
107+
| --- | ---: | ---: | ---: | ---: | ---: |
108+
| Parrot-7B | 128 | 2e-5 | 1 | 2048 | 0 |
109+
110+
#### Download Qwen1.5-7B-Chat checkpoints
111+
112+
Our base model Qwen1.5-7B-Chat, which is an instruction-tuned chatbot, can be downloaded from [here](https://huggingface.co/Qwen/Qwen1.5-7B-Chat).
113+
114+
## 🔎 Datasets
115+
116+
All training datasets are summarized in the Python file located at `parrot/train/utils/utils.py`. Each dataset contains a collection of samples where each sample consists of text and (optionally) image. The text data is embedded directly within the JSON file, while the image is represented by its filename. This filename refers to the image file located in the `image_dir`.
117+
118+
We provide the JSON file for each training dataset at [Huggingface](https://huggingface.co/datasets/AIDC-AI/Parrot-dataset/tree/main/sharegpt_4v). The images can be downloaded from their respective sources listed below.
119+
120+
| dataset name | image dir | image source |
121+
|:-------------------------------|---------------:|--------------------------------------------------------------:|
122+
| llava-pretrain-558k | llava_pretrain | https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain |
123+
| laion-12k | parrot_laion | https://huggingface.co/datasets/AIDC-AI/Parrot-dataset |
124+
| cc12m-645k | parrot_cc12m | https://huggingface.co/datasets/AIDC-AI/Parrot-dataset |
125+
| llava-finetune-665k | llava_finetune | https://github.com/haotian-liu/LLaVA |
126+
| sharegpt4v-sft-zh | multilingual_sft | https://huggingface.co/datasets/AIDC-AI/Parrot-dataset/tree/main/sharegpt_4v |
127+
| sharegpt4v-sft-pt | multilingual_sft | https://huggingface.co/datasets/AIDC-AI/Parrot-dataset/tree/main/sharegpt_4v |
128+
| sharegpt4v-sft-ar | multilingual_sft | https://huggingface.co/datasets/AIDC-AI/Parrot-dataset/tree/main/sharegpt_4v |
129+
| sharegpt4v-sft-tr | multilingual_sft | https://huggingface.co/datasets/AIDC-AI/Parrot-dataset/tree/main/sharegpt_4v |
130+
| sharegpt4v-sft-ru | multilingual_sft | https://huggingface.co/datasets/AIDC-AI/Parrot-dataset/tree/main/sharegpt_4v |
131+
132+
Below is an example of the folder structure. You can alter the folder structure as needed and modify the function `name2data` in `parrot/train/utils/utils.py` accordingly.
133+
```
134+
|-- mllm_datasets
135+
|-- meta_files
136+
|-- llava-pretrain-558k.json
137+
|-- laion-12k.json
138+
|-- llava-finetune-665k.json
139+
...
140+
|-- images
141+
|-- llava_pretrain
142+
|-- sharegpt4v
143+
|-- laion
144+
...
145+
```
146+
147+
## 🎄 MMMB
148+
149+
We provide the MMMB benchmark at [Huggingface](https://huggingface.co/datasets/AIDC-AI/Parrot-dataset/tree/main/mmmb). It contains 6 languages, 15 categories, and 12,000 questions. You can download the dataset and use it for your own experiments. We utilize the tsv file to store the dataset, and it is easy to evaluate using the `VLMEvalKit`.
150+
151+
<div align="center">
152+
<img src="./images/mmmb.png" width="600px" />
153+
</div>
154+
155+
## 🔑 Evaluation
156+
157+
> We use the [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) to evaluate MLLMs.
158+
159+
To evaluate the multilingual capabilities of Parrot, we conduct a comprehensive comparison of it with the state-of-the-art approaches using multilingual benchmarks. Additionally, we compare Parrot with leading models across a range of multimodal tasks. To ensure the reproducibility, we evaluate the models using VLMEvalKit. You can find the evaluation script in `VLMEvalKit/run.sh`. **Before running the script, please replace the paths related to the model and the dataset in the script.**
160+
161+
<div align="center">
162+
<img src="./images/performance_table.png" width="600px" />
163+
</div>
164+
165+
<div align="center">
166+
<img src="./images/performance.png" width="300px" />
167+
</div>
168+
169+
## 📍 Quick Start
170+
171+
We provide a quick start demo in `parrot/deploy/runner.py`, which can be used as a template to run Parrot for inference.
172+
173+
1. Before running the demo, please make sure you download the [Parrot checkpoint](https://huggingface.co/AIDC-AI/Parrot-7B) and the [Clip checkpoint](https://huggingface.co/openai/clip-vit-large-patch14-336).
174+
2. Second, you should replace the paths in the `runner.py`.
175+
3. Finally, run the python file in your system.
176+
177+
<div align="center">
178+
<img src="./images/example1.png" width="600px" />
179+
</div>
180+
181+
<div align="center">
182+
<img src="./images/example2.png" width="600px" />
183+
</div>
184+
185+
## 👨‍🏫 Acknowledgement
186+
187+
- [LLaVA](https://github.com/haotian-liu/LLaVA): the codebase we built upon.
188+
- [Qwen1.5-Chat](https://github.com/QwenLM/Qwen1.5): the LLM backbone we used.
189+
- [VLMEvalKit](https://github.com/open-compass/VLMEvalKit): the evaluation toolkit we used.
190+
191+
## 🤗 Contact
192+
193+
If there are any questions, please feel free to propose new features by opening an issue or contacting the author: **Hai-Long Sun**([[email protected]](mailto:[email protected])). Enjoy the code!
194+
195+
## 🚀 Star History
196+
197+
[![Star History Chart](https://api.star-history.com/svg?repos=AIDC-AI/Parrot&type=Date)](https://star-history.com/#AIDC-AI/Parrot&Date)

cog.yaml

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Configuration for Cog ⚙️
2+
# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md
3+
4+
build:
5+
gpu: true
6+
7+
python_version: "3.11"
8+
9+
python_packages:
10+
- "torch==2.0.1"
11+
- "accelerate==0.21.0"
12+
- "bitsandbytes==0.41.0"
13+
- "deepspeed==0.9.5"
14+
- "einops-exts==0.0.4"
15+
- "einops==0.6.1"
16+
- "gradio==3.35.2"
17+
- "gradio_client==0.2.9"
18+
- "httpx==0.24.0"
19+
- "markdown2==2.4.10"
20+
- "numpy==1.26.0"
21+
- "peft==0.4.0"
22+
- "scikit-learn==1.2.2"
23+
- "sentencepiece==0.1.99"
24+
- "shortuuid==1.0.11"
25+
- "timm==0.6.13"
26+
- "tokenizers==0.13.3"
27+
- "torchvision==0.15.2"
28+
- "transformers==4.39.0"
29+
- "wandb==0.15.12"
30+
- "wavedrom==2.0.3.post3"
31+
- "Pygments==2.16.1"
32+
run:
33+
- curl -o /usr/local/bin/pget -L "https://github.com/replicate/pget/releases/download/v0.0.3/pget" && chmod +x /usr/local/bin/pget
34+
35+
# predict.py defines how predictions are run on your model
36+
predict: "predict.py:Predictor"

images/example1.png

3.46 MB
Loading

images/example2.png

6.95 MB
Loading

images/mmmb.png

4.48 MB
Loading

images/performance.png

7.84 MB
Loading

images/performance_table.png

1.5 MB
Loading

images/teaser.png

1.42 MB
Loading

parrot/__init__.py

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
import os
2+
os.environ["TOKENIZERS_PARALLELISM"] = "false"
3+

0 commit comments

Comments
 (0)