DataFlow-Eval-Process is a data evaluation and processing system designed to evaluate data quality from multiple dimensions and filter out high-quality data. We mainly support SOTA algorithms within academic papers with strong theoretical support.
We now support text, image, video, and multimodality data types.
- DataFlow-Eval
| Module\Modality | Text | Image | Video | Image-Text Pair | Video-Text Pair |
|---|---|---|---|---|---|
| Data Evaluation | ✅ | ✅ | ✅ | ✅ | ✅ |
- [2024-12-26] 🎉 Our first data evaluation and processing system is now open source.
- [2024-10-14] 🎉 We summarize data evaluation papers and codes in 👋 Awesome Data Evaluation
- [2024-10-14] 🎉 Our first data-centric evaluation system is now open source.
For environment setup, please using the following commands👇
conda create -n dataflow python=3.9
conda activate dataflow
pip install -e .
If you want to evaluate each modality of data, please use the following commands:
text data eval
pip install -e .[text]
pip install flash-attn==2.6.3
python -m spacy download en_core_web_smimage data eval
pip install -e .[image]
pip install pyiqa==0.1.12
pip install transformers==4.44.2video data eval
pip install -e .[video]When evaluating video-caption data, please run the following command to install modified CLIP for EMScore:
pip install git+https://github.com/MOLYHECI/CLIP.git
All dependencies
pip install -e .[all]
pip install flash-attn==2.6.3
pip install pyiqa==0.1.12
pip install transformers==4.44.2cd path/to/DataFlow
python eval.py --config configs/eval/text_scorer_example1.yaml
python eval.py --config configs/eval/image_eval_example.yaml
python eval.py --config configs/eval/video_scorer.yaml
cd path/to/DataFlow
python process.py --config configs/process/text_process_example.yaml
python process.py --config configs/process/image_filter.yaml
python process.py --config configs/process/video_process.yaml
Data Evaluation Example using CLIPScore is shown as follow:
For the usage of evaluation, please refer to the following documents👇
- Text Data Evaluation User Documentation (English)
- 文本数据评估使用文档 (中文)
- Text Data Process User Documentation (English)
- 文本数据处理使用文档 (中文)
- Image Data Evaluation User Documentation (English)
- 图像数据评估使用文档 (中文)
- Image Data Process User Documentation (English)
- 图像数据处理使用文档 (中文)
- Video Data Evaluation User Documentation (English)
- 视频数据评估使用文档 (中文)
- Video Data Process User Documentation (English)
- 视频数据处理使用文档 (中文)
We summarize the SOTA algorithms from academic papers for data evaluation.
- Text Evaluation Algorithm Document (English)
- 文本算法介绍文档 (中文)
- Text Evaluation Algorithm Document (English)
- 文本算法介绍文档 (中文)
- Image Evaluation Algorithm Document (English)
- 图像数据评估使用文档 (中文)
- Image Evaluation Algorithm Document (English)
- 图像数据评估使用文档 (中文)

