You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hemm is a library for performing comprehensive benchmark of text-to-image diffusion models on image quality and prompt comprehension integrated with [Weights & Biases](https://wandb.ai/site) and [Weave](https://wandb.github.io/weave/).
4
6
5
7
Hemm is highly inspired by the following projects:
@@ -8,78 +10,75 @@ Hemm is highly inspired by the following projects:
8
10
-[T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation](https://karine-h.github.io/T2I-CompBench-new/)
9
11
-[GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment](https://arxiv.org/abs/2310.11513)
10
12
11
-
> [!WARNING]
12
-
> Hemm is still in early development, the API is subject to change, expect things to break. If you are interested in contributing, please feel free to open an issue and/or raise a pull request.
13
+
||
14
+
|:--:|
15
+
| The evaluation pipeline will take each example, pass it through your application and score the output on multiple custom scoring functions using [Weave Evaluation](https://wandb.github.io/weave/guides/core-types/evaluations). By doing this, you'll have a view of the performance of your model, and a rich UI to drill into individual ouputs and scores. |
16
+
17
+
## Leaderboards
18
+
19
+
| Leaderboard | Weave Evals |
20
+
|---|---|
21
+
|[Rendering prompts with Complex Actions](https://wandb.ai/hemm-eval/mllm-eval-action/reports/Leaderboard-Rendering-prompts-with-Complex-Actions--Vmlldzo5Mjg2Nzky)|[Weave Evals](https://wandb.ai/hemm-eval/mllm-eval-action/weave/evaluations)|
13
22
14
23
## Installation
15
24
25
+
First, we recommend you install the PyTorch by visiting [pytorch.org/get-started/locally](https://pytorch.org/get-started/locally/).
26
+
16
27
```shell
17
-
git clone https://github.com/soumik12345/Hemm
28
+
git clone https://github.com/wandb/Hemm
18
29
cd Hemm
19
30
pip install -e ".[core]"
20
31
```
21
32
22
33
## Quickstart
23
34
24
-
First let's publish a small subset of the MSCOCO validation set as a [Weave Dataset](https://wandb.github.io/weave/guides/core-types/datasets/).
First, you need to publish your evaluation dataset to Weave. Check out [this tutorial](https://weave-docs.wandb.ai/guides/core-types/datasets) that shows you how to publish a dataset on your project.
43
36
44
-
||
45
-
|:--:|
46
-
|[Weave Datasets](https://wandb.github.io/weave/guides/core-types/datasets/) enable you to collect examples for evaluation and automatically track versions for accurate comparisons. Easily update datasets with the UI and download the latest version locally with a simple API. |
47
-
48
-
Next, you can evaluate Stable Diffusion 1.4 on image quality metrics as shown in the following code snippet:
37
+
Once you have a dataset on your Weave project, you can evaluate a text-to-image generation model on the metrics.
49
38
50
39
```python
51
40
import wandb
52
41
import weave
53
42
43
+
54
44
from hemm.eval_pipelines import BaseDiffusionModel, EvaluationPipeline
55
45
from hemm.metrics.prompt_alignment import CLIPImageQualityScoreMetric, CLIPScoreMetric
| The evaluation pipeline will take each example, pass it through your application and score the output on multiple custom scoring functions using [Weave Evaluation](https://wandb.github.io/weave/guides/core-types/evaluations). By doing this, you'll have a view of the performance of your model, and a rich UI to drill into individual ouputs and scores. |
Copy file name to clipboardExpand all lines: docs/index.md
+32-35
Original file line number
Diff line number
Diff line change
@@ -12,78 +12,75 @@ Hemm is highly inspired by the following projects:
12
12
13
13
-[GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment](https://arxiv.org/abs/2310.11513)
14
14
15
-
!!! warning
16
-
Hemm is still in early development, the API is subject to change, expect things to break. If you are interested in contributing, please feel free to open an issue and/or raise a pull request.
15
+
||
16
+
|:--:|
17
+
| The evaluation pipeline will take each example, pass it through your application and score the output on multiple custom scoring functions using [Weave Evaluation](https://wandb.github.io/weave/guides/core-types/evaluations). By doing this, you'll have a view of the performance of your model, and a rich UI to drill into individual ouputs and scores. |
18
+
19
+
## Leaderboards
20
+
21
+
| Leaderboard | Weave Evals |
22
+
|---|---|
23
+
|[Rendering prompts with Complex Actions](https://wandb.ai/hemm-eval/mllm-eval-action/reports/Leaderboard-Rendering-prompts-with-Complex-Actions--Vmlldzo5Mjg2Nzky)|[Weave Evals](https://wandb.ai/hemm-eval/mllm-eval-action/weave/evaluations)|
17
24
18
25
## Installation
19
26
27
+
First, we recommend you install the PyTorch by visiting [pytorch.org/get-started/locally](https://pytorch.org/get-started/locally/).
28
+
20
29
```shell
21
-
git clone https://github.com/soumik12345/Hemm
30
+
git clone https://github.com/wandb/Hemm
22
31
cd Hemm
23
32
pip install -e ".[core]"
24
33
```
25
34
26
35
## Quickstart
27
36
28
-
First let's publish a small subset of the MSCOCO validation set as a [Weave Dataset](https://wandb.github.io/weave/guides/core-types/datasets/).
37
+
First, you need to publish your evaluation dataset to Weave. Check out [this tutorial](https://weave-docs.wandb.ai/guides/core-types/datasets) that shows you how to publish a dataset on your project.
|[Weave Datasets](https://wandb.github.io/weave/guides/core-types/datasets/) enable you to collect examples for evaluation and automatically track versions for accurate comparisons. Easily update datasets with the UI and download the latest version locally with a simple API. |
51
-
52
-
Next, you can evaluate Stable Diffusion 1.4 on image quality metrics as shown in the following code snippet:
39
+
Once you have a dataset on your Weave project, you can evaluate a text-to-image generation model on the metrics.
53
40
54
41
```python
55
42
import wandb
56
43
import weave
57
44
58
-
from hemm.eval_pipelines import BaseWeaveModel, EvaluationPipeline
59
-
from hemm.metrics.image_quality import LPIPSMetric, PSNRMetric, SSIMMetric
45
+
46
+
from hemm.eval_pipelines import BaseDiffusionModel, EvaluationPipeline
47
+
from hemm.metrics.prompt_alignment import CLIPImageQualityScoreMetric, CLIPScoreMetric
| The evaluation pipeline will take each example, pass it through your application and score the output on multiple custom scoring functions using [Weave Evaluation](https://wandb.github.io/weave/guides/core-types/evaluations). By doing this, you'll have a view of the performance of your model, and a rich UI to drill into individual ouputs and scores. |
0 commit comments