Open
Conversation
Promin3
commented
Mar 16, 2026
- 项目1 CPU推理优化:为全部算子添加 OpenMP 多线程并行 + AVX2/FMA SIMD 向量化 + OpenBLAS 集成,linear 算子 F32 大矩阵与 PyTorch 持平,rope/swiglu/rms_norm 等算子加速 2-33x
- 项目2 CUDA集成与GPU推理加速:实现完整 CUDA Runtime API + 10 个 CUDA 算子(含 cuBLAS Tensor Core),GPU 推理输出与 PyTorch 完全一致
- 项目3 AI聊天机器人:实现 Temperature/Top-K/Top-P 随机采样算子(CPU+CUDA)、FastAPI 聊天服务器(OpenAI 兼容 API,支持流式 SSE)、现代化 Web 聊天界面
保留作业阶段的提交历史,代码更新为项目 InfiniTensor#1/InfiniTensor#2/InfiniTensor#3 完整实现。
- 新增项目1.md:CPU算子性能Profile报告(OpenMP+AVX2+OpenBLAS vs PyTorch) - 新增项目2.md:CUDA算子正确性与性能报告(10个CUDA算子+GPU推理验证) - 新增项目3.md:AI聊天机器人验证报告(FastAPI服务器+SSE流式+Web UI) - 修复test/ops/self_attention.py中temp_mask未指定device导致CUDA测试失败的bug - REPORT.md重命名为报告.md,修正了不存在的test_ops.py引用 Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.