Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions .cursor/rules/pr-clean-ci.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
description: PR readiness requires attached, green GitHub checks
globs: "**/*"
alwaysApply: true
---

# Clean PR Rule

Do not create, mark ready, or recommend merging an Osaurus PR unless GitHub
checks are attached and green.

Before opening or updating a PR:

- Work from the clean checkout at `/Users/mmeding/Documents/Claude/Projects/osaurus-exec`.
- Keep `/Users/mmeding/Documents/Claude/Projects/osaurus` read-only.
- Run the smallest useful local verification for the files touched.
- Push only after the working tree is clean.

After opening or updating a PR:

- Confirm GitHub Actions attached checks to the PR.
- Wait for the required checks to finish: `test-core`, `test-cli`,
`swiftlint`, `shellcheck`, and `pr-clean-gate`.
- Run `scripts/ci/check-pr-clean.sh osaurus-ai/osaurus <PR number>`.
- Keep the PR draft or blocked if checks are missing, pending, cancelled, or
failing.

If GitHub shows zero checks:

- Do not treat local tests as sufficient for merge.
- Rebase, push, or close/reopen the PR to trigger Actions if you own the branch.
- If the branch belongs to an external fork and this account cannot update it,
leave a PR comment and require the author or a maintainer with branch access
to trigger CI.

If a shared CI failure blocks many PRs:

- Fix the shared CI problem first in a dedicated PR.
- Do not debug unrelated feature code until the shared blocker is green.
22 changes: 18 additions & 4 deletions .cursor/rules/swift-build.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,26 @@ alwaysApply: false

# Building OsaurusCore

The xcode workspace has pre-existing build failures in external dependencies (`mlx-swift-lm`, `IkigaJSON`). Never use `xcodebuild` to verify changes — it will always fail on those deps and waste tokens.
Use focused package tests while iterating, and use CI-parity `xcodebuild` only when you need to reproduce the GitHub Actions `test-core` job.

Instead, compile only the OsaurusCore package sources (no linking) to verify your changes:
Fast local checks from the repository root:

```bash
cd /Users/tpae/dev/osaurus/Packages/OsaurusCore && swift build 2>&1 | grep -E "error:" | grep -v "IkigaJSON"
swift test --package-path Packages/OsaurusCore
swift test --package-path Packages/OsaurusCLI --parallel
swift-format lint --strict --recursive Packages App
```

If the filtered output is empty, your code compiles cleanly.
CI-parity check from the repository root:

```bash
make ci-test
```

If you only need a compile smoke test for core sources, this is acceptable:

```bash
swift build --package-path Packages/OsaurusCore
```

Do not hardcode local absolute paths in docs or scripts. Use repo-root-relative commands unless a tool explicitly requires an absolute path.
12 changes: 10 additions & 2 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,21 @@ Explain the motivation and the changes. Link issues (e.g., Closes #123).

Steps to verify locally (commands, screenshots, recordings). Include model used.

Required before marking ready:

- [ ] Local targeted verification passed for the files touched
- [ ] GitHub checks are attached to this PR
- [ ] `test-core`, `test-cli`, `swiftlint`, `shellcheck`, and `pr-clean-gate` are green
- [ ] I ran `scripts/ci/check-pr-clean.sh osaurus-ai/osaurus <PR number>`

## Screenshots

If UI updated, add before/after.

## Checklist

- [ ] I have read `CONTRIBUTING.md`
- [ ] I have read `docs/CONTRIBUTING.md`
- [ ] I added/updated tests where reasonable
- [ ] I updated docs/README as needed
- [ ] I verified build on macOS with Xcode 16.4+
- [ ] I verified build on macOS with a Swift 6.2-capable Xcode toolchain
- [ ] This PR is draft/blocked if any GitHub check is missing, pending, cancelled, or failing
43 changes: 42 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ permissions:
env:
# Bump to invalidate every cache entry without source surgery (e.g., after a
# known-bad cache or an Xcode toolchain upgrade we want to flush manually).
CACHE_SALT: v2-vmlx-5b84387
CACHE_SALT: v3-pr-cold-deriveddata
# Pin Xcode so cache keys are stable across runner image bumps. When you
# need to upgrade, change here AND in setup-xcode below.
XCODE_VERSION: "26.4.1"
Expand Down Expand Up @@ -381,3 +381,44 @@ jobs:

- name: Lint shell scripts
run: find scripts -name '*.sh' -print0 | xargs -0 shellcheck --severity=warning

pr-clean-gate:
name: pr-clean-gate
runs-on: ubuntu-latest
timeout-minutes: 5
needs:
- test-core
- test-cli
- swiftlint
- shellcheck
if: ${{ always() }}
steps:
- name: Require all CI jobs to pass
env:
TEST_CORE: ${{ needs.test-core.result }}
TEST_CLI: ${{ needs.test-cli.result }}
SWIFTLINT: ${{ needs.swiftlint.result }}
SHELLCHECK: ${{ needs.shellcheck.result }}
run: |
{
echo "## PR clean gate"
echo
echo "| Job | Result |"
echo "| --- | --- |"
echo "| test-core | ${TEST_CORE} |"
echo "| test-cli | ${TEST_CLI} |"
echo "| swiftlint | ${SWIFTLINT} |"
echo "| shellcheck | ${SHELLCHECK} |"
} >> "$GITHUB_STEP_SUMMARY"

failed=0
for result in "$TEST_CORE" "$TEST_CLI" "$SWIFTLINT" "$SHELLCHECK"; do
if [ "$result" != "success" ]; then
failed=1
fi
done

if [ "$failed" -ne 0 ]; then
echo "::error title=CI is not clean::Every required CI job must finish with result=success before a PR is ready to merge."
exit 1
fi
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ cd osaurus
open osaurus.xcworkspace
```

Build and run the `osaurus` target. Requires Xcode 16+ and macOS 15.5+.
Build and run the `osaurus` target. Requires macOS 15.5+ and a Swift 6.2-capable Xcode toolchain. CI currently pins Xcode 26.4.1.

### Git Hooks (lefthook)

Expand Down Expand Up @@ -267,7 +267,7 @@ See [CONTRIBUTING.md](docs/CONTRIBUTING.md) for the architecture guide and layer

Osaurus is actively developed and we welcome contributions: bug fixes, new plugins, documentation, UI/UX improvements, and testing.

Check out [Good First Issues](https://github.com/osaurus-ai/osaurus/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22), read the [Contributing Guide](CONTRIBUTING.md), or join [Discord](https://discord.gg/osaurus). See [docs/FEATURES.md](docs/FEATURES.md) for the full feature inventory.
Check out [Good First Issues](https://github.com/osaurus-ai/osaurus/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22), read the [Contributing Guide](docs/CONTRIBUTING.md), or join [Discord](https://discord.gg/osaurus). See [docs/FEATURES.md](docs/FEATURES.md) for the full feature inventory and [docs/DEVELOPMENT_PLAN.md](docs/DEVELOPMENT_PLAN.md) for the forward roadmap.

## Community

Expand Down
58 changes: 54 additions & 4 deletions docs/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,16 @@ Requirements:

- macOS 15.5+
- Apple Silicon (M1 or newer)
- Xcode 16.4+
- A Swift 6.2-capable Xcode toolchain. CI currently pins Xcode 26.4.1.

Build and run:

1. Open `osaurus.xcworkspace` in Xcode 16.4+
1. Open `osaurus.xcworkspace` in Xcode with the same Swift toolchain family used by CI
2. Select the `osaurus` target and press Run
3. In the app UI, choose a port (default `1337`), then Start
4. Download a model from the Model Manager to generate text locally

Project layout and API overview are in `README.md`. For a complete feature inventory, see [FEATURES.md](FEATURES.md).
Project layout and API overview are in `README.md`. For a complete feature inventory, see [FEATURES.md](FEATURES.md). For prioritized roadmap work, see [DEVELOPMENT_PLAN.md](DEVELOPMENT_PLAN.md).

## Architecture guide

Expand Down Expand Up @@ -112,6 +112,28 @@ The core library (`Packages/OsaurusCore/`) follows a layered architecture. Each
- Write clear, focused commits; prefer Conventional Commits where practical
- Open a pull request early for feedback if helpful
- Keep PRs small and focused; describe user-facing changes and test steps
- Use [DEVELOPMENT_PLAN.md](DEVELOPMENT_PLAN.md) to choose priority when a change spans multiple workstreams

### Clean PR rule

A PR is not ready to merge until GitHub Actions are attached and green. Local
verification is required, but it is not a replacement for repository CI.

Before asking for review or merge:

1. Run the smallest useful local verification for the files touched.
2. Push the branch and confirm GitHub Actions attached checks to the PR.
3. Wait for `test-core`, `test-cli`, `swiftlint`, `shellcheck`, and
`pr-clean-gate` to finish.
4. Run:

```bash
scripts/ci/check-pr-clean.sh osaurus-ai/osaurus <PR number>
```

Keep the PR as draft or explicitly blocked if any check is missing, pending,
cancelled, or failing. A PR with zero attached checks is unverified; rebase,
push, or close/reopen it so Actions run before review continues.

### Code style

Expand Down Expand Up @@ -139,7 +161,34 @@ gitignored and not used by CI.
### Testing

- Add or update tests in `Packages/OsaurusCore/Tests/` where reasonable
- Ensure the project builds and tests pass in Xcode before submitting
- Run focused tests for the package you changed before submitting
- Use `make ci-test` when you need local parity with the CI `test-core` job
- Keep model, sandbox, network, and other external-infrastructure tests opt-in through environment variables

Recommended local checks:

| Change type | Command |
| ----------- | ------- |
| Formatting | `swift-format lint --strict --recursive Packages App` |
| Core logic | `swift test --package-path Packages/OsaurusCore` |
| CI parity for core tests | `make ci-test` |
| CLI changes | `swift test --package-path Packages/OsaurusCLI --parallel` |
| Plugin repository changes | `swift test --package-path Packages/OsaurusRepository` |
| Behavior/eval tuning | `make evals` or `make evals-report` |
| Shell scripts | `find scripts -name '*.sh' -print0 \| xargs -0 shellcheck --severity=warning` |

`Packages/OsaurusEvals` is intentionally off the normal CI path because it can burn model tokens and depend on local setup. Add eval cases for behavior that depends on model or provider output, but do not make them unconditional CI gates without an explicit maintainer decision.

### Definition of done

A contribution is ready for review when:

- The change follows the layer rules above
- Tests or evals cover the behavior, or the PR explains why coverage is not reasonable
- Docs, fixtures, and examples are updated for public API, tool, storage, plugin, or file format changes
- Security-sensitive changes include redaction, permission, and user-visible failure-mode thinking
- UI changes include screenshots or recordings when visual behavior changes
- The PR test plan lists the exact local commands or manual checks performed

### Commit and PR guidelines

Expand All @@ -160,6 +209,7 @@ Good documentation is just as important as good code. Here's how to contribute t
| -------------------------------------------------------------- | ----------------------------------------------------------------- |
| [README.md](../README.md) | Project overview, quick start, feature highlights |
| [FEATURES.md](FEATURES.md) | **Source of truth** — feature inventory and architecture |
| [DEVELOPMENT_PLAN.md](DEVELOPMENT_PLAN.md) | Prioritized roadmap, workstreams, and definition of done |
| [REMOTE_PROVIDERS.md](REMOTE_PROVIDERS.md) | Remote provider setup and configuration |
| [REMOTE_MCP_PROVIDERS.md](REMOTE_MCP_PROVIDERS.md) | Remote MCP provider setup |
| [DEVELOPER_TOOLS.md](DEVELOPER_TOOLS.md) | Insights and Server Explorer guide |
Expand Down
61 changes: 34 additions & 27 deletions docs/DEVELOPER_TOOLS.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,72 +255,79 @@ The Server Explorer requires the server to be running. If endpoints show as disa

How CI runs the Osaurus test suite, and the hooks that exist to debug it when it goes sideways.

### Jobs

The CI workflow is pinned to the runner and Xcode version declared in [`.github/workflows/ci.yml`](../.github/workflows/ci.yml).

| Job | Purpose | Current Timeout |
| --- | --- | --- |
| `test-core` | `xcodebuild test` for `OsaurusCoreTests` through `osaurus.xcworkspace` | 45 minutes |
| `test-cli` | `swift test --package-path Packages/OsaurusCLI --parallel` | 10 minutes |
| `swiftlint` | SwiftLint over the repo | 10 minutes |
| `shellcheck` | ShellCheck for scripts | 10 minutes |

### Reproduce CI locally

The Makefile target `make ci-test` runs the exact `xcodebuild` flags CI uses, piped through `xcbeautify`, and writes a result bundle:
The Makefile target `make ci-test` runs the same core `xcodebuild` path CI uses, pipes output through `xcbeautify`, and writes a result bundle:

```bash
brew install xcbeautify # one-time
make ci-test
open build/Tests.xcresult # full Xcode Test Navigator UI
```

If a test fails on CI but you can't reproduce it on your machine, download the `test-core-xcresult-*` artifact attached to the failed CI run and open it the same way.
Use narrower package tests while iterating, then use `make ci-test` before a risky PR or when chasing a CI-only failure.

### Long-running and integration tests

Tests that require external infrastructure (Apple Containerization, real GPU, network, etc.) must:
Tests that require external infrastructure (Apple Containerization, real GPU, network, model downloads, provider credentials, etc.) must:

1. **Be opt-in via an environment variable** never run unconditionally in CI.
2. **Use Swift Testing's `.disabled(if:)` trait** at the suite level so they're reported as `Disabled` (not silently passing). Pattern:
1. **Be opt-in via an environment variable** - never run unconditionally in CI.
2. **Use Swift Testing's `.disabled(if:)` trait** at the suite level so they are reported as `Disabled` rather than silently passing. Pattern:

```swift
private let isEnabled =
ProcessInfo.processInfo.environment["OSAURUS_RUN_FOO_TESTS"] == "1"

@Suite(.disabled(if: !isEnabled, "Set OSAURUS_RUN_FOO_TESTS=1 to run"))
struct FooIntegrationTests { }
struct FooIntegrationTests { ... }
```

3. **Keep individual test bodies under ~250ms of `Task.sleep`** and prefer event-driven waits (continuations, `AsyncStream`) for everything else.
3. **Keep individual test bodies under ~250ms of `Task.sleep`** and prefer event-driven waits such as continuations or `AsyncStream`.

Currently env-gated:

| Env var | Suite | Notes |
| ---------------------------------------- | ---------------------------------------------------------------------------------------- | ------------------------------------------------ |
| `OSAURUS_RUN_SANDBOX_INTEGRATION_TESTS=1` | [`SandboxIntegrationTests`](../Packages/OsaurusCore/Tests/Sandbox/SandboxIntegrationTests.swift) | Boots a Linux VM; runs `pip`/`npm`/`go` workloads. |
| Env var | Suite | Notes |
| --- | --- | --- |
| `OSAURUS_RUN_SANDBOX_INTEGRATION_TESTS=1` | [`SandboxIntegrationTests`](../Packages/OsaurusCore/Tests/Sandbox/SandboxIntegrationTests.swift) | Boots a Linux VM and runs package-manager workloads. |

### CI cache controls

The `test-core` job caches `~/Library/Developer/Xcode/DerivedData` keyed on Swift sources, manifests, resources, the pinned Xcode version, and a manual `CACHE_SALT`. Two recovery levers when you suspect a bad cache:
The `test-core` job caches SPM packages and `~/Library/Developer/Xcode/DerivedData`. DerivedData is keyed on Swift sources, manifests, resources, C headers/sources, the pinned Xcode version, and `CACHE_SALT`.

Two recovery levers exist when you suspect a bad cache:

1. **One-shot cold build**: trigger CI manually via the **Run workflow** button on the [CI workflow](../.github/workflows/ci.yml) page and check `clear_cache`. Skips the restore for that one run.
2. **Permanent bust**: bump `CACHE_SALT` (currently `v1`) at the top of `.github/workflows/ci.yml` to `v2` and merge. Every cache key invalidates immediately.
1. **One-shot cold build**: trigger CI manually via the **Run workflow** button and check `clear_cache`. CI still restores the cache first so the save key is available, then wipes restored DerivedData before building. The SPM source cache is preserved.
2. **Permanent bust**: bump `CACHE_SALT` at the top of `.github/workflows/ci.yml` and merge. Every DerivedData and SPM cache key invalidates immediately.

The cache only **saves** on `main` pushes — PRs read from it but never overwrite, so a half-baked branch can't poison everyone.
DerivedData cache saves only on successful `main` runs. PRs can read caches but cannot overwrite them.

### Where the logs live

The full xcodebuild output is collapsed into expandable groups by `xcbeautify`. On a failure CI also publishes:
The full `xcodebuild` output is grouped by `xcbeautify`. On failure or cancellation CI also publishes:

- A short failure summary (failed tests + assertion messages) at the top of the GitHub Actions run page.
- The raw `Tests.xcresult` bundle as a downloadable artifact (`test-core-xcresult-N`, 7 days retention).
- A GitHub step summary that distinguishes build failure, launch hang, zero-test-result hang, and ordinary failed test cases.
- The raw `Tests.xcresult` bundle as a downloadable artifact named `test-core-xcresult-N`, retained for 7 days.

A passing run produces ~1–2k log lines instead of the historical ~30k, and individual tests that hang are killed in ~2 min by `-test-timeouts-enabled YES` (default 60s, max 120s per test). The whole `test-core` job is also capped at 15 minutes via `timeout-minutes`.
Per-test timeouts are enabled with a 60-second default allowance and 120-second maximum allowance. This surfaces hung test names before the job wall-timeout whenever the test bundle launches far enough to report them.

### Deferred follow-up

Test wall-time is now bounded by the build-from-scratch cost of the full `OsaurusCore` package. The biggest remaining lever is splitting `OsaurusCore` into focused SPM targets (`OsaurusFoundation`, `OsaurusInference`, `OsaurusVoice`, `OsaurusUpdater`, `OsaurusSandbox`, `OsaurusUI`) so a Foundation-only PR doesn't rebuild MLX / FluidAudio / Sparkle / VecturaKit. File-coupling counts that justify the split:
Test wall-time is bounded by the build-from-scratch cost of the full `OsaurusCore` package. The biggest remaining lever is splitting `OsaurusCore` into focused targets so a foundation-only PR does not rebuild MLX, FluidAudio, Sparkle, VecturaKit, Containerization, SQLCipher, and SwiftUI-adjacent code.

- MLX/MLXLLM/MLXVLM/MLXLMCommon/Tokenizers: ~10 files, all in `Services/ModelRuntime*`, `Managers/Model/ModelManager.swift`, `Models/Configuration/VLMDetection.swift`, `Utils/StreamingDeltaProcessor.swift`, `Views/Chat/ChatView.swift`.
- `FluidAudio`: 2 files (`Managers/SpeechService.swift`, `Managers/Model/SpeechModelManager.swift`).
- `Sparkle`: 1 file (`Services/UpdaterService.swift`).
- `AAInfographics`: 1 file (`Views/Chat/NativeChartView.swift`).
- `VecturaKit`: 7 files in `Services/{Memory,Method,Skill,Tool}/*`.
- `Containerization`: 1 file (`Services/Sandbox/SandboxManager.swift`).
- `P256K`, `Highlightr`, `SwiftMath`: 1 file each.
The first split should isolate pure models, schemas, utility code, and low-dependency tests. One known boundary leak to clean before that split: `Models/Configuration/VLMDetection.swift` imports `MLXVLM` from the otherwise pure `Models/` tree.

Yet **64 of 70 test files use `@testable import OsaurusCore`**, so even tiny tests rebuild the heavy graph today. The one boundary leak that needs cleaning before the split: `Models/Configuration/VLMDetection.swift` imports `MLXVLM` from the otherwise-pure `Models/` tree.
See [DEVELOPMENT_PLAN.md](DEVELOPMENT_PLAN.md) for the prioritized architecture workstream.

---

Expand Down
Loading
Loading