-
Notifications
You must be signed in to change notification settings - Fork 2
Docs/contribution guidelines #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6445a4e
6f03231
686d1b6
0de132d
65bd0e5
2b1a405
aa25df1
b2b07ef
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| --- | ||
| name: Bug Report | ||
| about: Report a bug to help us improve | ||
| title: "[Bug] " | ||
| labels: bug | ||
| assignees: '' | ||
| --- | ||
|
|
||
| ## Description | ||
|
|
||
| A clear and concise description of the bug. | ||
|
|
||
| ## Steps to Reproduce | ||
|
|
||
| Minimal code to reproduce the issue: | ||
|
|
||
| ```python | ||
| # Your reproduction code here | ||
| ``` | ||
|
|
||
| ## Expected Behavior | ||
|
|
||
| What you expected to happen. | ||
|
|
||
| ## Actual Behavior | ||
|
|
||
| What actually happened. | ||
|
|
||
| ## Environment | ||
|
|
||
| - Python version: | ||
| - JAX version: | ||
| - Hardware: CPU / GPU (model) / TPU (version) | ||
| - OS: | ||
|
|
||
| ## Logs / Error Messages | ||
|
|
||
| ``` | ||
| Paste relevant logs or error messages here. | ||
| ``` | ||
|
|
||
| ## Additional Context | ||
|
|
||
| Any other context about the problem. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| --- | ||
| name: Feature Request | ||
| about: Suggest a new feature or enhancement | ||
| title: "[Feature] " | ||
| labels: enhancement | ||
| assignees: '' | ||
| --- | ||
|
|
||
| ## Motivation | ||
|
|
||
| Why is this feature needed? What problem does it solve? | ||
|
|
||
| ## Expected Behavior | ||
|
|
||
| Describe the desired functionality. | ||
|
|
||
| ## Proposed Approach | ||
|
|
||
| (Optional) How might this be implemented? | ||
|
|
||
| ## Willingness to Contribute | ||
|
|
||
| - [ ] I am willing to submit a PR for this feature. | ||
|
|
||
| ## Additional Context | ||
|
|
||
| Any other context, references, or related issues. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| ## Description | ||
|
|
||
| What does this PR do and why? | ||
|
|
||
| ## Related Issue | ||
|
|
||
| Closes # | ||
|
|
||
| ## Change Type | ||
|
|
||
| - [ ] `feat` — New feature | ||
| - [ ] `fix` — Bug fix | ||
| - [ ] `refactor` — Code refactoring | ||
| - [ ] `docs` — Documentation | ||
| - [ ] `ci` — CI/CD changes | ||
| - [ ] `test` — Tests | ||
| - [ ] `perf` — Performance improvement | ||
|
|
||
| ## Checklist | ||
|
|
||
| - [ ] Code passes `uv run ruff check src/ tests/` and `uv run ruff format src/ tests/` | ||
| - [ ] New/modified public APIs have complete docstrings (tensor shapes, dimension semantics, business logic) | ||
| - [ ] Public functions have input assertions (`assert` or `assert_shape_or_none`) | ||
| - [ ] Tests added at the appropriate layer (`tests/ops/`, `tests/modules/`, `tests/layers/`, or `tests/ref/`) | ||
| - [ ] If `tops/cpu/` is modified, core developers have been notified and PR is labeled `cpu-ref` | ||
|
|
||
| ## Test Results | ||
|
|
||
| ``` | ||
| Paste relevant test output here. | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,278 @@ | ||||||
| # Contributing to tops | ||||||
|
|
||||||
| Thank you for your interest in contributing to tops! This project implements high-performance JAX/Pallas TPU and GPU kernels for modern neural network architectures. | ||||||
|
|
||||||
| For a detailed understanding of the system architecture, layer design, and coding standards, please read [ARCHITECTURE.md](ARCHITECTURE.md). | ||||||
|
|
||||||
| ## Getting Started | ||||||
|
|
||||||
| ### Prerequisites | ||||||
|
|
||||||
| - Python >= 3.12 | ||||||
| - [uv](https://docs.astral.sh/uv/) package manager | ||||||
|
|
||||||
| ### Setup | ||||||
|
|
||||||
| 1. Clone the repository: | ||||||
|
|
||||||
| ```bash | ||||||
| git clone https://github.com/primatrix/pallas-kernel.git | ||||||
| cd pallas-kernel | ||||||
| ``` | ||||||
|
|
||||||
| 2. Install dependencies: | ||||||
|
|
||||||
| ```bash | ||||||
| # Base install (CPU only) | ||||||
| uv sync | ||||||
|
|
||||||
| # With GPU support (CUDA 12) | ||||||
| uv sync --extra gpu | ||||||
|
|
||||||
| # With TPU support | ||||||
| uv sync --extra tpu | ||||||
| ``` | ||||||
|
|
||||||
| 3. Install pre-commit hooks: | ||||||
|
|
||||||
| ```bash | ||||||
| pre-commit install | ||||||
| ``` | ||||||
|
|
||||||
| 4. Run tests to verify your setup: | ||||||
|
|
||||||
| ```bash | ||||||
| uv run pytest tests/ -v | ||||||
| ``` | ||||||
|
|
||||||
| ## Development Workflow | ||||||
|
|
||||||
| ### Branching | ||||||
|
|
||||||
| 1. Create your own development branch in this repository (e.g., `dev/<your-name>/my-feature`, `fix/issue-123`). | ||||||
| 2. Make your changes and push to the branch. | ||||||
| 3. CI will automatically run checks to ensure code passes lint, formatting, and tests. | ||||||
| 4. Once CI passes, open a Pull Request against the `main` branch for code review. | ||||||
|
|
||||||
| ### Commit Convention | ||||||
|
|
||||||
| We follow the [Conventional Commits](https://www.conventionalcommits.org/) format: | ||||||
|
|
||||||
| ``` | ||||||
| <type>[optional scope]: <description> | ||||||
| ``` | ||||||
|
|
||||||
| **Types:** | ||||||
|
|
||||||
| | Type | Usage | | ||||||
| |------|-------| | ||||||
| | `feat` | New feature | | ||||||
| | `fix` | Bug fix | | ||||||
| | `refactor` | Code refactoring (no behavior change) | | ||||||
| | `docs` | Documentation only | | ||||||
| | `ci` | CI/CD changes | | ||||||
| | `test` | Adding or updating tests | | ||||||
| | `perf` | Performance improvement | | ||||||
|
|
||||||
| **Scope** is optional and indicates the affected module: `feat(gla):`, `fix(pallas):`, etc. | ||||||
|
|
||||||
| **Examples from project history:** | ||||||
|
|
||||||
| - `ci: fix TPU job premature failure due to log disconnect` | ||||||
| - `fix(gla): relax g_gamma non-positive assertion` | ||||||
| - `feat: fused chunk simple gla vjp` | ||||||
|
|
||||||
| ### Pre-Submit Checks | ||||||
|
|
||||||
| Before submitting a PR, ensure the following pass locally: | ||||||
|
|
||||||
| ```bash | ||||||
| # Lint | ||||||
| uv run ruff check src/ tests/ | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
|
|
||||||
| # Format | ||||||
| uv run ruff format src/ tests/ | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
|
|
||||||
| # Tests | ||||||
| uv run pytest tests/ -v | ||||||
| ``` | ||||||
|
|
||||||
| Pre-commit hooks will run `ruff check --fix`, `ruff format`, `trailing-whitespace`, and `check-added-large-files` automatically on each commit. | ||||||
|
|
||||||
| ## Code Review Process | ||||||
|
|
||||||
| ### General Changes | ||||||
|
|
||||||
| Features, bug fixes, documentation, CI, and tests require **1 core maintainer approval** to merge. | ||||||
|
|
||||||
| ### Reference Implementation Changes | ||||||
|
|
||||||
| Changes to code under `tops/cpu/` (the correctness baseline for all GPU/TPU kernel validation) require review by **all core developers**. This review must be conducted via: | ||||||
|
|
||||||
| - Video conference discussion, OR | ||||||
| - Mailing list review | ||||||
|
|
||||||
| PRs modifying `tops/cpu/` must be labeled with `cpu-ref` to notify all core developers. | ||||||
|
|
||||||
| **Rationale:** `tops/cpu/` serves as the Golden reference for all kernel correctness validation. Changes here impact the entire test and validation system. | ||||||
|
|
||||||
| ### Review Process | ||||||
|
|
||||||
| A complete PR review follows these steps: | ||||||
|
|
||||||
| 1. **Author self-check**: Ensure all CI checks pass (lint, format, tests). PR description clearly explains what changed and why. | ||||||
| 2. **Reviewer examination**: | ||||||
| - **Correctness**: Is the logic correct? Are edge cases covered? | ||||||
| - **Coding standards**: Do docstrings, assertions, and naming follow project standards? | ||||||
| - **Test coverage**: Are tests added at the appropriate layer with sufficient coverage? | ||||||
| - **Performance impact**: Does this introduce unnecessary performance regressions? (For kernel PRs, check roofline analysis results.) | ||||||
| 3. **Iterate**: After reviewer feedback, the author addresses comments and pushes fixes until all comments are resolved. | ||||||
| 4. **Merge**: Once the required number of approvals is obtained, the reviewer or author merges into `main`. | ||||||
|
|
||||||
| ## Coding Standards | ||||||
|
|
||||||
| Key rules (see [ARCHITECTURE.md Section 2](ARCHITECTURE.md#2-agent-readability) for full details): | ||||||
|
|
||||||
| - **Docstrings**: All public functions must have comprehensive docstrings that explicitly describe tensor shapes, dimension semantics, and business logic. | ||||||
| - **Input assertions**: All public functions must enforce strict shape/type constraints using `assert` or `assert_shape_or_none` from `tops.utils`. | ||||||
| - **No-crash guarantee**: Kernels must never crash after passing entry-point assertions. | ||||||
|
|
||||||
| ## Testing Guidelines | ||||||
|
|
||||||
| The project uses a layered testing structure (see [ARCHITECTURE.md Section 3](ARCHITECTURE.md#3-testing--validation-boundaries) for full details): | ||||||
|
|
||||||
| | Directory | Purpose | | ||||||
| |-----------|---------| | ||||||
| | `tests/ops/` | Kernel correctness tests (GPU/TPU vs CPU reference) | | ||||||
| | `tests/modules/` | Module unit tests | | ||||||
| | `tests/layers/` | Layer integration tests | | ||||||
| | `tests/ref/` | Reference implementation comparison tests | | ||||||
|
|
||||||
| Key requirements: | ||||||
|
|
||||||
| - New kernels must have CPU reference comparison tests. | ||||||
| - Error bound constraint: TPU error must not exceed GPU error ($\epsilon_{\text{TPU}} \leq \epsilon_{\text{GPU}}$). | ||||||
| - Modifications at a specific layer must be verified by tests at that same layer. | ||||||
|
|
||||||
| ## Kernel Design Requirements | ||||||
|
|
||||||
| Every new or refactored Kernel **must** have a design document in `docs/design/` covering the following: | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a discrepancy in the design document path.
Suggested change
|
||||||
|
|
||||||
| ### 1. Algorithm and Code Logic | ||||||
|
|
||||||
| - Core mathematical formulas (e.g., recurrence relations, attention computations). | ||||||
| - Implementation strategy: why this approach was chosen and trade-offs with alternatives. | ||||||
|
|
||||||
| ### 2. Data Flow and Tiling Logic | ||||||
|
|
||||||
| - Tiling / Blocking strategy: rationale for block sizes, how they map to hardware (TPU VMEM / GPU SMEM). | ||||||
| - Data movement between HBM and VMEM/SMEM (DMA pipeline, double buffering, etc.). | ||||||
| - Grid dimension partitioning: which dimensions are parallel vs. arbitrary, and why. | ||||||
|
|
||||||
| ### 3. Performance Target and Roofline Analysis | ||||||
|
|
||||||
| The performance target for all Kernels is **80% of hardware theoretical peak**: | ||||||
|
|
||||||
| - If the Kernel is **memory bandwidth-bound**: target is 80% of hardware memory bandwidth peak. | ||||||
| - If the Kernel is **compute-bound**: target is 80% of hardware compute peak (FLOPS). | ||||||
|
|
||||||
| The design document must include: | ||||||
|
|
||||||
| - **Roofline analysis**: arithmetic intensity of the Kernel and whether it is bandwidth-bound or compute-bound. | ||||||
| - **Local benchmark results**: measured throughput on target hardware (e.g., TFLOPS, GB/s). | ||||||
| - **Trace viewer analysis**: include screenshots or links from a trace viewer (e.g., Perfetto / TensorBoard Profiler) showing the Kernel's actual execution timeline — compute, memory transfer, and synchronization phases with their time distribution — to verify pipeline effectiveness and identify bubbles or unnecessary stalls. | ||||||
| - **Whether the 80% target is met**. If not, the reason must be clearly stated: | ||||||
| - **Hardware limitation**: unsupported instructions, memory alignment constraints, known hardware bottlenecks, etc. | ||||||
| - **Code logic limitation**: tiling strategy constraints, pipeline bubbles, synchronization overhead, etc., along with future optimization directions. | ||||||
|
|
||||||
| ### 4. Precision and Error Analysis | ||||||
|
|
||||||
| - **Error metrics**: error between Kernel output and `tops/cpu/` high-precision reference (atol, rtol, max ULP). | ||||||
| - **Gradient error** (if applicable): backward pass error metrics. | ||||||
| - **Test methodology**: input distributions used, shape configurations, hardware tested on. | ||||||
| - **Error bound constraint**: TPU error must not exceed GPU error ($\epsilon_{\text{TPU}} \leq \epsilon_{\text{GPU}}$). | ||||||
|
|
||||||
| ## Adding New Kernels | ||||||
|
|
||||||
| 1. Write a design document in `docs/design/` first. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
| 2. Implement CPU reference in `tops/cpu/ops/<domain>/`. | ||||||
| 3. Submit CPU reference for **all-core-developer review** (see [Code Review Process](#reference-implementation-changes)). | ||||||
| 4. Implement GPU/TPU kernel in `tops/ops/<domain>/`. | ||||||
| 5. Add comparison tests in `tests/ops/<domain>/`. | ||||||
| 6. Export public API via `tops/ops/__init__.py`. | ||||||
| 7. Ensure docstrings and assertions meet coding standards. | ||||||
| 8. Submit non-`tops/cpu/` code (steps 4-7), which requires **1 core maintainer approval** to merge. | ||||||
|
|
||||||
| > **Note:** Due to the current small number of core developers, the tiered review policy above is enforced by convention only — it is not yet backed by GitHub branch protection rules or CODEOWNERS. Automated enforcement will be added as the team grows. | ||||||
|
|
||||||
| ## Release Process | ||||||
|
|
||||||
| The project uses trunk-based development with release branches (see [ARCHITECTURE.md Section 4](ARCHITECTURE.md#4-release--versioning-policy) for full details): | ||||||
|
|
||||||
| 1. RC tag created on `main` when ready for delivery (`vX.Y.Z-rc.N`). | ||||||
| 2. Release branch `release/vX.Y` pulled from RC tag. | ||||||
| 3. Integration testing; fixes cherry-picked back to `main`. | ||||||
| 4. Final tag (`vX.Y.Z`) when stable. | ||||||
|
|
||||||
| ## Profiling Guide | ||||||
|
|
||||||
| ### Installation | ||||||
|
|
||||||
| Install profiling dependencies: | ||||||
|
|
||||||
| ```bash | ||||||
| uv sync --extra profile | ||||||
| ``` | ||||||
|
|
||||||
| This installs [xprof](https://github.com/google/xprof) (TensorBoard Profiler) for collecting and analyzing kernel execution traces on TPU/GPU. | ||||||
|
|
||||||
| ### Collecting Traces | ||||||
|
|
||||||
| Use `jax.profiler` to capture traces for a specific code region: | ||||||
|
|
||||||
| ```python | ||||||
| import jax | ||||||
|
|
||||||
| # Start profiler server | ||||||
| jax.profiler.start_server(port=9012) | ||||||
|
|
||||||
| # Or manually mark a trace region | ||||||
| jax.profiler.start_trace("/tmp/my_trace") | ||||||
| # ... run your kernel ... | ||||||
| jax.profiler.stop_trace() | ||||||
| ``` | ||||||
|
|
||||||
| You can also connect to the profiler server remotely via TensorBoard for interactive trace capture. | ||||||
|
|
||||||
| ### Analyzing Traces | ||||||
|
|
||||||
| ```bash | ||||||
| # Launch TensorBoard to view traces | ||||||
| tensorboard --logdir /tmp/my_trace | ||||||
| ``` | ||||||
|
|
||||||
| In the TensorBoard **Profile** tab, focus on: | ||||||
|
|
||||||
| - **Trace Viewer**: kernel execution timeline — check time distribution across compute, memory transfer, and synchronization phases. | ||||||
| - **Pipeline effectiveness**: look for bubbles, unnecessary waits, or synchronization overhead. | ||||||
| - **Op-level timing**: identify hotspot operations. | ||||||
|
|
||||||
| Trace analysis results must be included in the [Kernel Design document](#3-performance-target-and-roofline-analysis). | ||||||
|
|
||||||
| ## Hardware Access | ||||||
|
|
||||||
| The project provides [SkyPilot](https://skypilot.readthedocs.io/)-based cluster launch scripts: | ||||||
|
|
||||||
| ```bash | ||||||
| # Launch GPU cluster | ||||||
| ./scripts/launch_gpu.sh L4 my-cluster | ||||||
|
|
||||||
| # Launch TPU cluster | ||||||
| ./scripts/launch_tpu.sh tpu-v6e-1 my-cluster | ||||||
| ``` | ||||||
|
|
||||||
| See the resource configuration files in `scripts/` (`gpu_resource.sky.yaml`, `tpu_resource.sky.yaml`) for details. You need a configured cloud account and SkyPilot environment before use. | ||||||
|
|
||||||
| ## Questions? | ||||||
|
|
||||||
| If you have questions, open an issue on GitHub or reach out to the maintainers. | ||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -55,6 +55,10 @@ from tops.layers.gla import GatedLinearAttention | |||||
| from tops.modules.layernorm import RMSNorm | ||||||
| ``` | ||||||
|
|
||||||
| ## Contributing | ||||||
|
|
||||||
| We welcome contributions! Please read our [Contributing Guide](CONTRIBUTING.md) ([中文版](CONTRIBUTING.zh.md)) to get started. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The link to
Suggested change
|
||||||
|
|
||||||
| ## License | ||||||
|
|
||||||
| Apache License 2.0 | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
ruffcommands in the checklist should be updated to use thetops/directory instead ofsrc/, as the latter does not exist in the project root according to the architecture guide.