Skip to content

Conversation

@Jiawei-Shao
Copy link
Contributor

@Jiawei-Shao Jiawei-Shao commented Dec 16, 2025

Description

This patch supports more dim_inner (up to 4096) for Split-K to optimize more models.
This patch also enables Split-K on gen-12lp.

Motivation and Context

With this PR we can achieve about 30% improvement on jina-clip-v1-text-fp16 and 20% improvement on jina-embeddings-v2-base-code-fp16 on Lunar Lake iGPUs.

This patch adds the support of more `dim_inner` (up to 3072) for
Split-K to optimize more models.
@Jiawei-Shao Jiawei-Shao marked this pull request as draft December 16, 2025 05:44
@Jiawei-Shao Jiawei-Shao marked this pull request as ready for review December 16, 2025 06:36
@Jiawei-Shao
Copy link
Contributor Author

@qjia7 @jchen10 PTAL, thanks!

@Jiawei-Shao Jiawei-Shao changed the title [webgpu] Support dim_inner <= 3072 for Split-K [webgpu] Support Split-K in more situations Dec 16, 2025
@guschmue
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Dec 18, 2025
@qjia7 qjia7 requested review from fs-eire and guschmue December 23, 2025 07:02
@Jiawei-Shao Jiawei-Shao marked this pull request as draft December 29, 2025 02:10
@Jiawei-Shao Jiawei-Shao marked this pull request as ready for review January 4, 2026 08:30
@Jiawei-Shao Jiawei-Shao requested a review from qjia7 January 5, 2026 05:45
qjia7
qjia7 previously approved these changes Jan 5, 2026
guschmue
guschmue previously approved these changes Jan 6, 2026
@guschmue guschmue enabled auto-merge (squash) January 6, 2026 16:08
@guschmue
Copy link
Contributor

guschmue commented Jan 6, 2026

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@guschmue
Copy link
Contributor

guschmue commented Jan 6, 2026

CI is nagging on linux and macos:
image

auto-merge was automatically disabled January 7, 2026 01:09

Head branch was pushed to by a user without write access

@Jiawei-Shao Jiawei-Shao dismissed stale reviews from guschmue and qjia7 via 8da3a33 January 7, 2026 01:09
@Jiawei-Shao Jiawei-Shao requested a review from guschmue January 7, 2026 03:01
@Jiawei-Shao
Copy link
Contributor Author

CI is nagging on linux and macos: image

Hi @guschmue, sorry about that! I've fixed this failure, PTAL, thanks!

@guschmue
Copy link
Contributor

guschmue commented Jan 7, 2026

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@guschmue guschmue merged commit 4a858a8 into microsoft:main Jan 7, 2026
105 of 178 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants