[RFC]: Fixing the ViT Backend especially ROCm

### Motivation.

Right now it is very messy and the AMD CI is broken.

The ViT changes also does not take into account the other model.py files, it only changes it for qwen_2_5_vl.py which potentially breaking all other models.py files. 

`vllm/model_executor/models/dots_ocr.py`
`vllm/model_executor/models/ernie45_vl.py`
`vllm/model_executor/models/glm4_1v.py`
`vllm/model_executor/models/qwen2_vl.py`
`vllm/model_executor/models/siglip2navit.py`

vLLM currently have refactor to introduce the use of `--mm-encoder-attn-backend` to select the attention backend.
The PR is https://github.com/vllm-project/vllm/pull/27061 , and a bugfix PR https://github.com/vllm-project/vllm/pull/27124 .


Since the introduction of torch.compile into the ViT, currently only starting with qwen vl model in PR https://github.com/vllm-project/vllm/pull/23207 , the AMD ViT Code path are broken. Multiple bugfix PR attempts are not working:

1. https://github.com/vllm-project/vllm/pull/27190 fix torch.sdpa accuracy issue.
2. https://github.com/vllm-project/vllm/pull/27744 fix torch.sdpa accuracy issue.

First, we should shrink down the  https://github.com/vllm-project/vllm/pull/27061/files#r2443909604 the `_Backend` by introducing another `_MHA_Backend` registry.

Make sure that the ViT attention is a platform specific. We should determine `platform` interface. We also perform override in the `platform` interface. We should avoid doing that in the `model.py` files 

In the `platform` interface, we should only return `_MHA_Backend`, we should not return the functions. The functions should only be returned through `maybe_get_vit_flash_attn_backend` .

Honor `--mm-encoder-attn-backend` so that we can write unit tests to test all different backends. AMD Instinct GPU is able to test all backends. Radeon GPUs only are able to use the TORCH_SDPA code path.




### Proposed Change.

Changes
1. First, we should shrink down the  https://github.com/vllm-project/vllm/pull/27061/files#r2443909604 the `_Backend` by introducing another `_MHA_Backend` registry.

2. Make sure that the ViT attention is a platform specific. We should determine `platform` interface. We also perform override in the `platform` interface. We should avoid doing that in the `model.py` files 

2. `get_vit_attn_backend` in the `platform` interface has to be able to access the `--mm-encoder-attn-backend`.

2. We need to deprecate this line `https://github.com/vllm-project/vllm/blob/33a0ea5f3264b5b2f571b8a53357e10efcc94670/vllm/model_executor/models/vision.py#L96` it is using `VLLM_ATTENTION_BACKEND` which is for Text Backbone. The ViT should not use this environment variable.

2. In the `platform` interface, we should only return `_MHA_Backend`, we should not return the functions. The functions should only be returned through `maybe_get_vit_flash_attn_backend` .

2. Added a `logger.info_once` so that users know which `_MHA_Backend` is selected in the end.

2. Clean up cuda code path. Since `vllm.vllm_flash_attn` is just a wrapper for `flash_attn` library, on cuda, we always use `vllm.vllm_flash_attn` instead of `flash_attn`. 

https://github.com/vllm-project/vllm/blob/ba33e8830dceb32e9b03508bbff435e3082759b8/vllm/attention/layer.py#L120-L125 .

3.  Write unit tests to test all different backends. Since there are large model sizes, we will check the VRAM size, if it is large enough, we run it. We provide such a unit test so that developers can run locally.


### Feedback Period.

_No response_

### CC List.

_No response_

### Any Other Things.

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC]: Fixing the ViT Backend especially ROCm #75

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: Fixing the ViT Backend especially ROCm #75

Description

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions