Skip to content

mirrorboat/Open-DataFlow-Eval

 
 

Repository files navigation

中文主页

DataFlow-Eval-Process

License: apache-2-0 GitHub Stars Open Issues

DataFlow-Eval-Process is a data evaluation and processing system designed to evaluate data quality from multiple dimensions and filter out high-quality data. We mainly support SOTA algorithms within academic papers with strong theoretical support.

We now support text, image, video, and multimodality data types.

Table of Contents

Module and Modality Support

Module\Modality Text Image Video Image-Text Pair Video-Text Pair
Data Evaluation

News

  • [2024-12-26] 🎉 Our first data evaluation and processing system is now open source.
  • [2024-10-14] 🎉 We summarize data evaluation papers and codes in 👋 Awesome Data Evaluation
  • [2024-10-14] 🎉 Our first data-centric evaluation system is now open source.

Installation

For environment setup, please using the following commands👇

conda create -n dataflow python=3.9
conda activate dataflow
pip install -e .

If you want to evaluate each modality of data, please use the following commands:

text data eval

pip install -e .[text]
pip install flash-attn==2.6.3
python -m spacy download en_core_web_sm

image data eval

pip install -e .[image]
pip install pyiqa==0.1.12
pip install transformers==4.44.2

video data eval

pip install -e .[video]

When evaluating video-caption data, please run the following command to install modified CLIP for EMScore:

pip install git+https://github.com/MOLYHECI/CLIP.git

All dependencies

pip install -e .[all]
pip install flash-attn==2.6.3
pip install pyiqa==0.1.12
pip install transformers==4.44.2

Quick Start

Quick Evaluation:

cd path/to/DataFlow
python eval.py --config configs/eval/text_scorer_example1.yaml
python eval.py --config configs/eval/image_eval_example.yaml
python eval.py --config configs/eval/video_scorer.yaml

Quick Process:

cd path/to/DataFlow
python process.py --config configs/process/text_process_example.yaml
python process.py --config configs/process/image_filter.yaml
python process.py --config configs/process/video_process.yaml

Jupyter Notebook Demo

Text

Image

Video

Data Evaluation Example using CLIPScore is shown as follow:

Data Evaluation & Process Documentation

For the usage of evaluation, please refer to the following documents👇

Text Documentation

Image Documentation

Video Documentation

Data Evaluation & Process Algorithms

We summarize the SOTA algorithms from academic papers for data evaluation.

Text Algorithms

Image Algorithms

Video Algorithms

Awesome Data Evaluation

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%