Open
Conversation
…limination; add tests; modify cmakelists
feat: add operators at backend, including header, kernel impl, kernel/graph tests
fix: convert reg; update tests feat: const fold for clip, unary; identity elimination; idempotence elimination; add tests; modify cmakelists
format files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
算子添加
kernel与operator算子适配,包括clip, conv, rms_norm等等unified_converter.py中为算子进行 PyTorch 注册Test 通过截图
图优化功能支持
GraphPass类可以实现其他的图优化策略并统一管理graph_optimizer.cc):拓扑、形状推理、常量折叠、Identity 消除、幂等消除、死代码消除等等test_graph_optimizerCUDA Graph 支持
setCudaGraphEnabled()显式启用,且要求 Graph 在 CUDA 设备上infinirt框架支持对 Graph 进行捕获并将 Graph 编译为 CUDA Graph(结果保存为线程本地缓存),以及 CUDA Graph 的运行invalidateCudaGraph()使缓存的 CUDA Graph 失效test_graph.cc中添加简单测试:CudaGraphLifecycleBookkeeping, CudaGraphInvalidatesOnOutputMutation, CudaGraphCompileLaunchAndRecompileAfterMutation限制