Skip to content

Add Apple Metal compute documentation suite#180

Open
qhy991 wants to merge 4 commits intoandrewyng:mainfrom
qhy991:codex/apple-metal-docs-2026-03-21
Open

Add Apple Metal compute documentation suite#180
qhy991 wants to merge 4 commits intoandrewyng:mainfrom
qhy991:codex/apple-metal-docs-2026-03-21

Conversation

@qhy991
Copy link

@qhy991 qhy991 commented Mar 21, 2026

Summary

Add a substantial Apple Metal compute documentation suite for Context Hub, with a cpp-first focus for macOS kernel and host-wrapper development.

Changes

  • Add 48 new Apple Metal documentation pages under content/apple/docs/...
  • Cover core compute topics:
    • kernel basics
    • compute launch patterns
    • memory and threadgroup usage
    • simdgroup patterns
    • buffer layout and alignment
    • resource binding and host wrapper patterns
  • Cover optimization and pipeline topics:
    • tiled matmul
    • reduction
    • image and 2D kernels
    • command buffer reuse and batching
    • library and pipeline compilation
    • argument buffers and residency
    • heaps, fences, and events
    • producer-consumer staging
    • double-buffered pipelines
    • kernel fusion tradeoffs
  • Cover irregular and advanced workload topics:
    • prefix scan
    • transpose and layout reorder
    • histogram and binning
    • gather/scatter and conflict resolution
    • segmented reduction
    • ragged and masked kernels
    • streaming and online kernels
    • multistage tensor pipelines
  • Cover debugging and numerical topics:
    • validation and profiling workflow
    • numerical drift debugging
    • silent NaN/Inf debugging
    • softmax/logsumexp stability
    • host-device synchronization
    • memory pressure checklist
    • prefetch/reuse heuristics
    • transpose-free layout choices
  • Add one Python boundary doc for PyTorch MPS vs custom Metal:
    • content/apple/docs/pytorch-mps-vs-custom-metal/python/DOC.md
  • Expand search regression coverage from existing CUDA/PTX-focused cases to include 64 total cases, including Apple Metal queries
  • Refresh scripts/search_regression_baseline.json

What

This adds a practical, agent-friendly Apple Metal documentation track so coding agents can retrieve authoritative guidance for macOS compute kernel development without falling back immediately to generic web search.

The new docs are written to support:

  • Metal kernel authoring
  • Objective-C++ / C++ host-side launch patterns
  • pipeline and staging design
  • synchronization and resource lifetime reasoning
  • numerical debugging and profiling workflows

Why

Context Hub already had strong CUDA/PTX coverage, but there was no comparable local documentation path for Metal.

That gap matters because Metal compute development has its own semantics and failure modes:

  • threadgroup and simdgroup behavior differ from CUDA warp/block assumptions
  • host-device coordination and resource residency are frequent sources of bugs
  • many failures are wrapper- or pipeline-level rather than kernel-math-level
  • agents can easily confuse Metal, MPS, and PyTorch mps unless local docs make those boundaries explicit

This PR improves local retrieval quality for Apple Metal topics and gives agents a cpp-first reference path for macOS GPU programming.

Testing

Content quality:

  • English-only content under content/apple/docs
  • Proper YAML frontmatter on all new docs
  • Apple-official source links included in each page
  • Structured for retrieval and practical implementation guidance rather than raw API dumps

Build and regression:

  • ./cli/bin/chub build content --validate-only — passed (1670 docs, 6 skills, 0 warnings)
  • ./cli/bin/chub build content — passed
  • python3 scripts/search_regression.py --mode check — passed (64/64)
  • python3 scripts/search_regression.py --mode snapshot --snapshot-out scripts/search_regression_baseline.json — passed
  • rg -n "[\p{Han}]" content/apple/docs | wc -l0

Sources

Apple official documentation used for extraction:

PyTorch official documentation used for the boundary doc:

Notes

Primary sources used for extraction were Apple Metal official documentation and tools pages, with PyTorch docs used only for the MPS-vs-custom-Metal boundary doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant