Skip to content

Conversation

@AgainstEntropy
Copy link

Description

This PR is a follow-up to PR #17383 and adds support for the GGML_OP_GET_REL_POS operator in the Vulkan backend for both F16 and F32 data types.

To verify the implementation while keeping it tidy, I patched the changes from PR #17383 onto a separate branch and ran test-backend-ops -o GET_REL_POS for testing. Below is a screenshot of the test-run for reference:

image

Notes

  1. ⚠️ This PR depends on PR ggml : enhance rel-pos and window ops with CUDA support #17383 for the test_get_rel_pos changes. Once that PR is merged, I will rebase this branch.
  2. Non-contiguous view inputs are not yet supported.
  3. In the Vulkan shader, a small epsilon is added to pos to avoid floating-point precision issues.
    Without the epsilon, some test cases fail on my Windows AMD GPU (but work fine on a Linux Nvidia GPU): image

* Introduced new Vulkan pipeline for get_rel_pos operation for both float and half-precision types.
* Implemented the get_rel_pos compute shader to calculate relative positions based on input tensor dimensions.
* Updated shader generation to include new get_rel_pos variants.
* Added a small epsilon to the position calculation in the get_rel_pos compute shader to mitigate floating point precision issues. This change ensures more accurate results when computing relative positions based on input tensor dimensions.
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Nov 20, 2025
vk_op_unary_push_constants pc = vk_op_unary_push_constants_init(src0, dst, ggml_nelements(dst));
init_pushconst_fastdiv(pc);

std::array<uint32_t, 3> elements;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use ggml_vk_op_f32?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because incontiguous input is not supported. The GGML_ASSERT check in ggml_vk_op_f32 will fail.
I think the behavior should be similar to GGML_OP_ARGSORT (while I'm not sure why GGML_OP_ARGSORT appears in ggml_vk_op_f32 but comes with a GGML_ASSERT(0)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARGSORT used to use ggml_vk_op_f32, it just switched away from it because it needs custom logic and shader invocations for the handling of large input tensors.

I think you should be able to use ggml_vk_op_f32 for this op. Can you be more specific which assertion failed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @0cc4m, thanks for the clarification.

Previously I encountered an assertion failure for the case GET_REL_POS(type=f32, C=1, qh=1, kh=1, v=1) in ggml_vk_op_f32, specifically on this line:

ggml/src/ggml-vulkan/ggml-vulkan.cpp:8764: GGML_ASSERT(ggml_vk_op_supports_incontiguous(op) || ggml_vk_dim01_contiguous(src0)) failed

I reviewed the whole process and realized that ggml_is_contiguous returned true here (src0 ne={1, 1, 1, 1}, viewed from ne={2, 1, 1, 1}). Therefore, ggml_backend_vk_device_supports_op(...) returned true, and then the check ggml_vk_dim01_contiguous(src0) later in ggml_vk_op_f32 failed.

I have updated ggml_vk_get_rel_pos to use ggml_vk_op_f32, and ggml_vk_dim01_contiguous has been added to ggml_backend_vk_device_supports_op so GET_REL_POS(type=f32,C=1,qh=1,kh=1,v=1) is no longer considered supported.

To work with incontiguous inputs (v=1), we can simply use ggml_cont.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants