Open
Conversation
…s and tests update
- src/ops/linear/op.cpp: OpenMP for 索引改为 signed integral type,修复 C3016 - 项目InfiniTensor#4: 连续批处理/KV 池/监控/批处理正确性测试等(见 docs) Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
This PR extends the build, runtime, and Python bindings to support NVIDIA CUDA execution (including device runtime + CUDA ops), adds optional NCCL-based tensor-parallel support, and introduces several diagnostic/benchmark scripts and tests while renaming the Python package to llaisys_py.
Changes:
- Add CUDA/NVIDIA build targets, device runtime implementation, and GPU dispatch for multiple ops (plus CPU fallbacks where needed).
- Add optional NCCL communication layer and Python bindings/tests for tensor-parallel inference (Project #5).
- Rename/standardize Python package import path to
llaisys_py, update tests, and add server/diagnostic tooling.
Reviewed changes
Copilot reviewed 88 out of 132 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| xmake/nvidia.lua | Adds CUDA static targets (device + ops) with devlink and gencodes. |
| xmake/cpu.lua | Adjusts CPU targets’ warnings/flags handling to rely on global -fPIC. |
| xmake.lua | Global build policy/flags, optional NCCL target, OpenMP/AVX flags, updated install copy logic. |
| test/test_tensor.py | Updates tests to import llaisys_py. |
| test/test_tensor_parallel.py | Adds tensor-parallel (NCCL) multi-process test harness. |
| test/test_runtime.py | Updates tests to import llaisys_py. |
| test/test_multi_user_chat.py | Adds concurrent HTTP chat request test script. |
| test/test_kv_cache.py | Adds KV cache export/import + suffix prefill test script. |
| test/test_batch_correctness.py | Adds sequential vs Engine batched-output correctness script. |
| test/ops/swiglu.py | Updates op test to import llaisys_py. |
| test/ops/self_attention.py | Updates op test to import llaisys_py. |
| test/ops/sample.py | Adds sampling op test script (CPU-oriented). |
| test/ops/rope.py | Updates op test to import llaisys_py. |
| test/ops/rms_norm.py | Updates op test to import llaisys_py. |
| test/ops/linear.py | Updates op test to import llaisys_py. |
| test/ops/linear_bench.py | Adds reproducible linear benchmark runner. |
| test/ops/linear_bench_report.py | Adds JSON benchmark comparison/report script. |
| test/ops/embedding.py | Updates op test to import llaisys_py. |
| test/ops/argmax.py | Updates op test to import llaisys_py. |
| test/ops/add.py | Updates op test to import llaisys_py. |
| test/minimal_engine_test.py | Adds minimal tokenizer+engine reproduction script. |
| test/diagnose_gpu_layer.py | Adds layer-by-layer GPU correctness diagnostic script. |
| src/utils/types.cpp | Formatting-only change (no behavioral change). |
| src/utils/check.hpp | Formatting-only change (macros unchanged). |
| src/utils.hpp | Formatting-only change. |
| src/tensor/tensor.hpp | Formatting-only change. |
| src/ops/swiglu/op.hpp | Formatting-only change. |
| src/ops/swiglu/op.cpp | Implements SwiGLU (CPU + NVIDIA dispatch). |
| src/ops/self_attention/op.hpp | Formatting-only change. |
| src/ops/sample/op.hpp | Adds sampling op API declaration + doc. |
| src/ops/rope/op.hpp | Formatting-only change. |
| src/ops/rope/op.cpp | Implements RoPE (CPU + NVIDIA dispatch). |
| src/ops/rms_norm/op.hpp | Formatting-only change. |
| src/ops/rms_norm/op.cpp | Implements RMSNorm (CPU + NVIDIA dispatch). |
| src/ops/rearrange/op.hpp | Formatting-only change. |
| src/ops/rearrange/op.cpp | Formatting-only change (still TODO). |
| src/ops/linear/op.hpp | Formatting-only change. |
| src/ops/embedding/op.hpp | Formatting-only change. |
| src/ops/embedding/op.cpp | Implements embedding (CPU + NVIDIA dispatch). |
| src/ops/argmax/op.hpp | Formatting-only change. |
| src/ops/argmax/op.cpp | Implements argmax (CPU + NVIDIA dispatch). |
| src/ops/add/op.hpp | Adds docs to Add op interface. |
| src/ops/add/op.cpp | Adds NVIDIA dispatch and context switching for Add. |
| src/ops/add/cpu/add_cpu.hpp | Adds docs to CPU Add implementation interface. |
| src/ops/add/cpu/add_cpu.cpp | Reworks CPU Add implementation and adds extensive inline commentary. |
| src/llaisys/tensor.cc | Switches to LLAISYS_EXTERN_C wrapper macro. |
| src/llaisys/runtime.cc | Switches to LLAISYS_EXTERN_C wrapper macro. |
| src/llaisys/ops.cc | Adds sample op export; allows null bias for linear; switches extern wrapper macro. |
| src/llaisys/nccl_comm.cu | Adds NCCL implementation (guarded by ENABLE_NCCL + ENABLE_NVIDIA_API). |
| src/llaisys/nccl_comm_stub.cc | Adds NCCL stub symbols when NCCL isn’t enabled. |
| src/llaisys/llaisys_tensor.hpp | Switches to LLAISYS_EXTERN_C wrapper macro. |
| src/device/runtime_api.hpp | Formatting-only change. |
| src/device/runtime_api.cpp | Formatting-only change. |
| src/device/nvidia/nvidia_runtime_api.cu | Implements CUDA runtime API backend (device/mem/stream/memcpy). |
| src/device/nvidia/nvidia_resource.cuh | Formatting-only change. |
| src/device/nvidia/nvidia_resource.cu | Formatting-only change. |
| src/device/device_resource.hpp | Formatting-only change. |
| src/device/cpu/cpu_runtime_api.cpp | Formatting-only change. |
| src/device/cpu/cpu_resource.hpp | Formatting-only change. |
| src/device/cpu/cpu_resource.cpp | Formatting-only change. |
| src/core/storage/storage.hpp | Formatting-only change. |
| src/core/storage/storage.cpp | Formatting-only change. |
| src/core/runtime/runtime.hpp | Adds shutdown-deactivation flag/API. |
| src/core/runtime/runtime.cpp | Implements shutdown-deactivation logic in Runtime lifecycle. |
| src/core/llaisys_core.hpp | Formatting-only change. |
| src/core/core.hpp | Formatting-only change. |
| src/core/context/context.hpp | Formatting-only change. |
| src/core/context/context.cpp | Introduces global runtime pool to share CUDA context across threads. |
| src/core/allocator/naive_allocator.hpp | Formatting-only change. |
| src/core/allocator/naive_allocator.cpp | Formatting-only change. |
| src/core/allocator/allocator.hpp | Formatting-only change. |
| scripts/run_server.sh | Adds helper script to run server with PYTHONPATH. |
| scripts/list_safetensors_keys.py | Adds safetensors metadata inspection script. |
| scripts/download_model.py | Adds Hugging Face model download helper. |
| python/setup.cfg | Renames package to llaisys-py, updates package data section. |
| python/pyproject.toml | Formatting-only change. |
| python/llaisys/models/qwen2.py | Removes old (stub) llaisys package model code. |
| python/llaisys_py/tensor.py | Formatting-only change. |
| python/llaisys_py/server/README.md | Adds server usage docs. |
| python/llaisys_py/server/chat_cli.py | Adds CLI client for the server. |
| python/llaisys_py/server/main.py | Adds server entrypoint with model loading + uvicorn run. |
| python/llaisys_py/server/init.py | Exposes create_app. |
| python/llaisys_py/runtime.py | Formatting-only change. |
| python/llaisys_py/ops.py | Adds sampling binding; linear bias optional. |
| python/llaisys_py/models/init.py | Formatting-only change. |
| python/llaisys_py/libllaisys/tensor.py | Formatting-only change. |
| python/llaisys_py/libllaisys/runtime.py | Formatting-only change. |
| python/llaisys_py/libllaisys/qwen2.py | Adds ctypes bindings for expanded Qwen2 C API. |
| python/llaisys_py/libllaisys/ops.py | Adds ctypes signature for llaisysSample. |
| python/llaisys_py/libllaisys/nccl_comm.py | Adds ctypes bindings for NCCL comm API. |
| python/llaisys_py/libllaisys/llaisys_types.py | Formatting-only change. |
| python/llaisys_py/libllaisys/init.py | Preloads OpenMP runtime on Linux; loads qwen2 + NCCL bindings. |
| python/llaisys_py/init.py | Normalizes CUDA_VISIBLE_DEVICES at import time; re-exports API. |
| LICENSE | Formatting-only change. |
| include/llaisys/tensor.h | Switches extern wrapper macro usage. |
| include/llaisys/runtime.h | Switches extern wrapper macro usage. |
| include/llaisys/ops.h | Adds llaisysSample C API; switches extern wrapper macro usage. |
| include/llaisys/ops_nvidia.h | Adds CUDA op declaration header for NVIDIA dispatch. |
| include/llaisys/nccl_comm.h | Adds NCCL comm C API header. |
| include/llaisys/models/qwen2.h | Expands Qwen2 C API for batching/TP/KV cache; switches extern wrapper macro usage. |
| include/llaisys.h | Renames __C macro to LLAISYS_EXTERN_C. |
| docs/install-xmake.md | Adds Xmake install instructions for Linux servers. |
| =42 | Adds a pip install log file (likely accidental). |
| .gitignore | Adds model dir ignore; now ignores entire docs/. |
| .github/workflows/build.yaml | Formatting-only change. |
| .clang-format | Formatting-only change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1
to
+11
| Looking in indexes: https://mirrors.aliyun.com/pypi/simple/ | ||
| Collecting setuptools | ||
| Downloading https://mirrors.aliyun.com/pypi/packages/e1/c6/76dc613121b793286a3f91621d7b75a2b493e0390ddca50f11993eadf192/setuptools-82.0.0-py3-none-any.whl (1.0 MB) | ||
| ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 24.9 MB/s eta 0:00:00 | ||
| Collecting wheel | ||
| Downloading https://mirrors.aliyun.com/pypi/packages/87/22/b76d483683216dde3d67cba61fb2444be8d5be289bf628c13fc0fd90e5f9/wheel-0.46.3-py3-none-any.whl (30 kB) | ||
| Collecting packaging>=24.0 (from wheel) | ||
| Downloading https://mirrors.aliyun.com/pypi/packages/b7/b9/c538f279a4e237a006a2c98387d081e9eb060d203d8ed34467cc0f0b9b53/packaging-26.0-py3-none-any.whl (74 kB) | ||
| ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 74.4/74.4 kB 23.6 MB/s eta 0:00:00 | ||
| Installing collected packages: setuptools, packaging, wheel | ||
| Successfully installed packaging-26.0 setuptools-82.0.0 wheel-0.46.3 |
Comment on lines
+13
to
+19
| #define EXCEPTION_UNSUPPORTED_DATATYPE(DT__) \ | ||
| do { \ | ||
| std::cerr << "[ERROR] Unsupported data type: " \ | ||
| << llaisys::utils::dtype_to_str(DT__) \ | ||
| << EXCEPTION_LOCATION_MSG << std::endl; \ | ||
| throw std::runtime_error("Unsupported device"); \ | ||
| } while (0) |
Comment on lines
14
to
17
| Runtime::~Runtime() { | ||
| if (!_is_active) { | ||
| if (!_is_active && !_deactivated_for_shutdown) { | ||
| std::cerr << "Mallicious destruction of inactive runtime." << std::endl; | ||
| } |
Comment on lines
+15
to
+22
| static ncclDataType_t to_nccl_dtype(llaisysDataType_t dtype) { | ||
| switch (dtype) { | ||
| case LLAISYS_DTYPE_F32: return ncclFloat32; | ||
| case LLAISYS_DTYPE_F16: | ||
| case LLAISYS_DTYPE_BF16: return ncclFloat16; | ||
| case LLAISYS_DTYPE_I64: return ncclInt64; | ||
| default: return ncclFloat32; | ||
| } |
Comment on lines
12
to
20
| #ifdef __cplusplus | ||
| #define __C extern "C" | ||
| #define LLAISYS_EXTERN_C extern "C" | ||
| #include <cstddef> | ||
| #include <cstdint> | ||
| #else | ||
| #define __C | ||
| #define LLAISYS_EXTERN_C | ||
| #include <stddef.h> | ||
| #include <stdint.h> | ||
| #endif |
Comment on lines
+17
to
+19
| for i in range(ndev): | ||
| print("Testing device {i}...") | ||
| api.set_device(i) |
Comment on lines
4
to
7
| #include "../llaisys.h" | ||
|
|
||
| __C { | ||
| LLAISYS_EXTERN_C { | ||
| // Runtime API Functions |
Comment on lines
4
to
7
| #include "../llaisys.h" | ||
|
|
||
| __C { | ||
| LLAISYS_EXTERN_C { | ||
| typedef struct LlaisysTensor *llaisysTensor_t; |
Comment on lines
4
to
7
| #include "tensor.h" | ||
|
|
||
| __C { | ||
| LLAISYS_EXTERN_C { | ||
| __export void llaisysAdd(llaisysTensor_t c, llaisysTensor_t a, llaisysTensor_t b); |
Comment on lines
4
to
7
| #include "../tensor.h" | ||
|
|
||
| __C { | ||
| LLAISYS_EXTERN_C { | ||
| struct LlaisysQwen2Meta { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.