Skip to content

Commit

Permalink
bugfix: Align KV chunk size binary search with actual KV chunk splitt…
Browse files Browse the repository at this point in the history
…ing. (#728)

Close #726. Alignes KV chunk size binary search with the real strategy
so that the resulting `kv_chunk_size` would yield correct
`new_batch_size`.

Co-authored-by: Zhengyuan <[email protected]>
  • Loading branch information
timzsu and Zhengyuan authored Jan 9, 2025
1 parent 13de896 commit 00ba0ae
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion include/flashinfer/attention/scheduler.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -96,12 +96,13 @@ inline auto PrefillBinarySearchKVChunkSize(const bool enable_cuda_graph,

int64_t low = min_kv_chunk_size;
int64_t high = max_kv_len;
constexpr int64_t min_kv_len = 1;
while (low < high) {
const int64_t mid = (low + high) / 2;
int64_t new_batch_size = 0;
for (uint32_t i = 0; i < batch_size; ++i) {
new_batch_size +=
ceil_div(packed_qo_len_arr[i], qo_chunk_size) * ceil_div(kv_len_arr[i], mid);
ceil_div(packed_qo_len_arr[i], qo_chunk_size) * ceil_div(std::max(kv_len_arr[i], min_kv_len), mid);
}
if (new_batch_size > max_batch_size_if_split) {
low = mid + 1;
Expand Down

0 comments on commit 00ba0ae

Please sign in to comment.