Skip to content

Commit

Permalink
Tasks: Add image-text-to-text pipeline and inference API to task page (
Browse files Browse the repository at this point in the history
…#1039)

..and remove the long inference

---------

Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: vb <[email protected]>
Co-authored-by: Merve Noyan <[email protected]>
  • Loading branch information
4 people authored Dec 12, 2024
1 parent d01296c commit 8c62f4a
Showing 1 changed file with 36 additions and 24 deletions.
60 changes: 36 additions & 24 deletions packages/tasks/src/tasks/image-text-to-text/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,39 +32,51 @@ Vision language models can recognize images through descriptions. When given det

## Inference

You can use the Transformers library to interact with vision-language models. You can load the model like below.
You can use the Transformers library to interact with [vision-language models](https://huggingface.co/models?pipeline_tag=image-text-to-text&transformers). Specifically, `pipeline` makes it easy to infer models.

Initialize the pipeline first.

```python
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="llava-hf/llava-interleave-qwen-0.5b-hf")
```

The model's built-in chat template will be used to format the conversational input. We can pass the image as an URL in the `content` part of the user message:

```python
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained(
"llava-hf/llava-v1.6-mistral-7b-hf",
torch_dtype=torch.float16
)
model.to(device)
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg",
},
{"type": "text", "text": "Describe this image."},
],
}
]

```

We can infer by passing image and text dialogues.
We can now directly pass in the messages to the pipeline to infer. The `return_full_text` flag is used to return the full prompt in the response, including the user input. Here we pass `False` to only return the generated text.

```python
from PIL import Image
import requests
outputs = pipe(text=messages, max_new_tokens=60, return_full_text=False)

# image of a radar chart
url = "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "[INST] <image>\nWhat is shown in this image? [/INST]"
outputs[0]["generated_text"]
# The image captures a moment of tranquility in nature. At the center of the frame, a pink flower with a yellow center is in full bloom. The flower is surrounded by a cluster of red flowers, their vibrant color contrasting with the pink of the flower. \n\nA black and yellow bee is per
```

inputs = processor(prompt, image, return_tensors="pt").to(device)
output = model.generate(**inputs, max_new_tokens=100)
You can also use the Inference API to test image-text-to-text models. You need to use a [Hugging Face token](https://huggingface.co/settings/tokens) for authentication.

print(processor.decode(output[0], skip_special_tokens=True))
# The image appears to be a radar chart, which is a type of multivariate chart that displays values for multiple variables represented on axes
# starting from the same point. This particular radar chart is showing the performance of different models or systems across various metrics.
# The axes represent different metrics or benchmarks, such as MM-Vet, MM-Vet, MM-Vet, MM-Vet, MM-Vet, MM-V
```bash
curl https://api-inference.huggingface.co/models/meta-llama/Llama-3.2-11B-Vision-Instruct \
-X POST \
-d '{"messages": [{"role": "user","content": [{"type": "image"}, {"type": "text", "text": "Can you describe the image?"}]}]}' \
-H "Content-Type: application/json" \
-H "Authorization: Bearer hf_***"
```

## Useful Resources
Expand Down

0 comments on commit 8c62f4a

Please sign in to comment.