feat: complete LLAISYS assignments 1 2 3 and projects 1 2 3 6 by Saberlilya · Pull Request #41 · InfiniTensor/llaisys

Saberlilya · 2026-03-15T12:19:33Z

本 PR 完成 LLAISYS 的以下课程内容，并补齐中文提交文档：

Assignment 如何调用修改过的llaisys库？ #1：Tensor
Assignment slice后stride的疑问 #2：Operators
Assignment README Typo #3：Large Language Model Inference
Project 如何调用修改过的llaisys库？ #1：CPU 优化
Project slice后stride的疑问 #2：第二平台 MetaX/MACA
Project README Typo #3：聊天服务
Project fix: fix missing assert in embedding.py #6：支持新模型

主要改动

完成 Tensor 基础能力，包括 load、isContiguous、view、permute、slice
完成 CPU 侧关键算子：argmax、embedding、linear、rms_norm、rope、self_attention、swiglu
完成 Qwen2 推理链路、权重装载与 token 级对照验证
基于 OpenMP 完成 CPU 热点算子优化
新增独立 METAX 设备类型与 --metax-gpu=y 构建开关
完成 MetaX/MACA runtime 与关键算子路径接入，linear 对接 mcblasGemmEx
实现聊天服务与流式返回接口
新增 Llama/TinyLlama 路径的 C++/Python 包装与基于 config.json 的模型类型自动分发
补齐提交总览、实现报告与复现流程
本 PR 只包含实现代码与正式提交文档，本地学习材料与外部 PDF 未纳入提交

已验证命令

本地 CPU 路径：

xmake f --nv-gpu=n --metax-gpu=n -cv
xmake -r

python test/test_tensor.py
python test/test_runtime.py --device cpu
python test/test_ops.py --device cpu
python test/test_infer.py --device cpu --test --model models/DeepSeek-R1-Distill-Qwen-1.5B --prompt hi --max_steps 1
聊天服务最小验证：

bash

PYTHONPATH=python python -m llaisys.chat.server --model models/DeepSeek-R1-Distill-Qwen-1.5B --device cpu --host 127.0.0.1 --port 8011
curl --noproxy '*' -s http://127.0.0.1:8011/health
curl --noproxy '*' -s -X POST http://127.0.0.1:8011/v1/chat/completions -H 'Content-Type: application/json' -d '{"messages":[{"role":"user","content":"你好"}],"stream":false,"max_tokens":8}'
新模型验证入口：

bash

python test/test_infer.py --device cpu --test --model /path/to/local/llama_or_tinyllama_model --prompt hi --max_steps 1
MetaX 路径：

bash

XMAKE_ROOT=y xmake f --metax-gpu=y -cv
XMAKE_ROOT=y xmake -r
XMAKE_ROOT=y xmake install

python test/test_runtime.py --device metax
python test/test_ops.py --device metax
python test/test_infer.py --device metax --test --model_id trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 --prompt hi --max_steps 1
说明
Assignment #1/#2/#3 与 Project #1/#3/#6 主要在本地 CPU 环境完成验证
Project #2 在真实沐曦 MetaX C500 机器上完成实机验证
MetaX 在 C/C++ SDK 层不是 CUDA drop-in 兼容平台，因此后端采用独立适配
当前推理验证以 Qwen2 为主；Project #6 提供 Llama/TinyLlama 新模型接入与本地模型目录验证入口
当前机器没有 NVIDIA 硬件，因此本次没有新增 --device nvidia 的实机回归数据
根目录外部 PDF 保持未跟踪状态，不提交进仓库
提交文档
提交总览：submission_zh.md
实现报告：report_zh.md
复现流程：reproduce_zh.md

Saberlilya and others added 30 commits February 2, 2026 11:42

finish hw1 2 hw3 waiting

a9a366e

Assignment InfiniTensor#3: Qwen2 inference with KV cache

a2cf465

trigger github actions

7d5b027

Assignment InfiniTensor#3: add qwen2 model implementation

67f8f53

fix: add libllaisys qwen2 ctypes bindings

f658111

fix: track python libllaisys and add qwen2 ctypes bindings

44cf45b

fix: windows build (no exceptions across C boundary)

42514a8

fix: windows build warnings in tensor loops

6f77d90

docs: update comments for better readability

5e7dd8d

docs: update comments for better readability

6b9d013

chore: rerun ci

915f394

chore: trigger official CI

64a26b0

chore: rerun ci on my repo

84ac6cd

chore: trigger ci (non-md change)

7ec901b

feat: complete llaisys projects 1 2 3 6

dae9421

feat: add MetaX/MACA backend and submission docs

f1522d6

fix: release hf gpu cache before llaisys infer

0f385a0

docs: trim submission materials

d75c940

Update pr_zh.md

3b12931

Update reproduce_zh.md

9cd02ac

Update README_ZN.md

b6e3b17

Update report_zh.md

9507553

fix: track decoder model files and fix ci builds

46c8a02

fix: align qwen2 python wrapper with decoder base

702742f

docs: clean submission materials before final pr

97450c2

docs: harden reproduce environment checks

d441885

docs: sync metax submission docs

586d107

docs: finalize submission wording

d51708d

docs: merge submission docs into full course report

be32fb3

docs: add project 6 to submission scope

910dfb2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: complete LLAISYS assignments 1 2 3 and projects 1 2 3 6#41

feat: complete LLAISYS assignments 1 2 3 and projects 1 2 3 6#41
Saberlilya wants to merge 30 commits intoInfiniTensor:mainfrom
Saberlilya:checkpoint/nvidia-done

Saberlilya commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Saberlilya commented Mar 15, 2026

主要改动

已验证命令

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant