Removed redundant templates and related compile-time/runtime code#91
Open
Enigmatisms wants to merge 8 commits intoPaddlePaddle:mainfrom
Open
Removed redundant templates and related compile-time/runtime code#91Enigmatisms wants to merge 8 commits intoPaddlePaddle:mainfrom
Enigmatisms wants to merge 8 commits intoPaddlePaddle:mainfrom
Conversation
fabad61 to
5a86acb
Compare
1 task
…tion. Removed split.divmod for PPT to save some regs
Revert conditional pipeline ops (potential bugs, not covered by CI)
5a86acb to
afc34ab
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#87 的 reopen 版本,消除了 #89 的冲突。本 PR 还包含了 #86 的 code,所以 #86 已经 closed,本 PR 一起全部进行了测试(同时包含 #81 )。#86 相关的优化见 #86 的 PR 描述。
初步简化了 FMv3 的模板表达:
Split相关逻辑(包括简化了 PPT/DualPPTX 的多余 fast_divmod 模块)Is_flashmaskbool template argIntraWGOverlapbool template arg,默认一定 Truebenchmark除了 seqlen = 128 有所提升(转静态调度)之外其他配置的性能没有变化,正确性已经通过测试(逐位对齐)。
为了不引起前序未合入 PR 冲突,本 PR 应该在 #81, #86 合入后合入。#81 合入需要手动解冲突,#86 合入后需要 rebase。
大幅简化了
tile_scheduler.h,删除了不必要的实现,将公共部分用基类管理。PPT 增加了步长设置,某些 mask 类型利用 Stride 是有利的。TODO