vulkan : refactor buffer handling in vk_op_f32 #16840

Acly · 2025-10-29T11:36:02Z

This moves buffer/uma/offset/size computations into a function, which reduces amount of repetitive code in ggml_vk_op_f32 a lot. There are more places outside of vk_op_f32 where it can be reused, I didn't adapt those yet.

I replicated the existing logic, but don't really understand why the branching is needed for sub-buffer size computation. From what I can see and test it doesn't change the result, maybe this can be simplified? Or there is some corner case I can't think of? (see questions in comments)

Acly · 2025-10-29T11:40:29Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

+    size_t size;
+    if (support_incontiguous) {
+        size = ggml_nbytes(tensor) + misalign_bytes;
+        if (offset + size >= buffer->size) {


This replicates the branch here https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-vulkan/ggml-vulkan.cpp#L8620-L8631

Why is this needed? The case offset+size > buffer->size sounds like it should never happen. In the case where they're equal, the branch computes the same value that's already there.

It might clamp the size to maxStorageBufferRange but that seems like it would likely produce wrong results and should rather assert/abort?

I think some of this logic is only here for historical reasons at this point. Early on, some shaders didn't have thorough bounds checking logic and relied on robustBufferAccess and these tight bounds to accomplish bounds checking. But robustBufferAccess isn't as reliable as it sounds, and we've improved the bounds checking logic over time.

To my knowledge, the only remaining shader that intentionally relies on robustBufferAccess bounds checking is the scalar/coopmat1 mul_mm shaders (not sure about mmq, but probably that too). These need to do the A/B loads in a way that allows the compiler to batch them, so manual bounds checking would probably interfere and cause a slowdown.

So, I think all of the current users of ggml_vk_op_f32 (and everything else except these mul_mm shaders) could just use ggml_vk_get_max_buffer_range.

Yes, I created a lot of that code for a very different version of the backend, and also when I knew a lot less about Vulkan than I do now. It's very possible there are checks and branches that are no longer needed.

Thanks for the feedback. I simplified the size to just ggml_nbytes(tensor) + misalign_bytes - this should be the actual bounds, and what the other branches ended up computing anyway.

Using ggml_vk_get_max_buffer_range also works, but feels less straight forward and it's still unclear to me if there's a reason to prefer it.

I now adapted all ops except for the mul_mat variants. They use more complex size computation. Also I want to avoid conflicts with #16868

ggml_vk_get_max_buffer_range is necessary for some operations when the size of the tensor is greater than 4GB (or greater than the maxBufferRange) and would lead to an invalid descriptor range being set. Please try enabling these ifdefed out tests:

#if 0 // >4GB im2col destination. Too slow to run by default. // Test cases taken from Wan2.1 T2V 1.3B. ...

#if 0 // > 4GB A matrix. Too slow to be enabled by default. ...

I thought that's what this line was for:

} else if (op == GGML_OP_IM2COL || op == GGML_OP_IM2COL_3D) { if (ctx->device->shader_int64 && ctx->device->buffer_device_address) { // buffer device address path doesn't use dst buffer d_sz = 1; }

(and mul_mat I didn't change so far)

IMO it would be even better to not have the binding in the shader at all (put it in a #else). Why is it there if it cannot be bound properly and isn't used? Instead of setting d_sz = 1 it could just do a dispatch right there without the dst buffer.

Good hint about having to enable those tests though, I hadn't done that. Unfortunately they just OOM on my 12GB GPU, I will find some way to check them.

ggml/src/ggml-vulkan/ggml-vulkan.cpp

Acly commented Oct 29, 2025

View reviewed changes

ggml/src/ggml-vulkan/ggml-vulkan.cpp Outdated Show resolved Hide resolved

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Oct 29, 2025

vulkan : refactor/simplify buffer handling in vk_op_* functions

edacb52

Acly force-pushed the vulkan-op-subbuffer branch from 09475f8 to edacb52 Compare October 31, 2025 14:23

Acly marked this pull request as ready for review October 31, 2025 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan : refactor buffer handling in vk_op_f32 #16840

vulkan : refactor buffer handling in vk_op_f32 #16840

Acly commented Oct 29, 2025

Uh oh!

Acly Oct 29, 2025

Uh oh!

jeffbolznv Oct 29, 2025

Uh oh!

0cc4m Oct 30, 2025

Uh oh!

Acly Oct 31, 2025

Uh oh!

jeffbolznv Oct 31, 2025

Uh oh!

Acly Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vulkan : refactor buffer handling in vk_op_f32 #16840

Are you sure you want to change the base?

vulkan : refactor buffer handling in vk_op_f32 #16840

Conversation

Acly commented Oct 29, 2025

Uh oh!

Acly Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

jeffbolznv Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

0cc4m Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Acly Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

jeffbolznv Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Acly Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants