From 609b98d8d0626eae5bf9e95c9b279da6e6c695b3 Mon Sep 17 00:00:00 2001 From: Michael Meding Date: Thu, 30 Apr 2026 15:00:43 -0300 Subject: [PATCH 1/5] Document development plan and contributor workflow --- .cursor/rules/swift-build.mdc | 22 ++- .github/pull_request_template.md | 4 +- README.md | 4 +- docs/CONTRIBUTING.md | 37 ++++- docs/DEVELOPER_TOOLS.md | 61 ++++---- docs/DEVELOPMENT_PLAN.md | 229 +++++++++++++++++++++++++++++++ docs/FEATURES.md | 1 + 7 files changed, 319 insertions(+), 39 deletions(-) create mode 100644 docs/DEVELOPMENT_PLAN.md diff --git a/.cursor/rules/swift-build.mdc b/.cursor/rules/swift-build.mdc index a7e196b52..85ece18dc 100644 --- a/.cursor/rules/swift-build.mdc +++ b/.cursor/rules/swift-build.mdc @@ -6,12 +6,26 @@ alwaysApply: false # Building OsaurusCore -The xcode workspace has pre-existing build failures in external dependencies (`mlx-swift-lm`, `IkigaJSON`). Never use `xcodebuild` to verify changes — it will always fail on those deps and waste tokens. +Use focused package tests while iterating, and use CI-parity `xcodebuild` only when you need to reproduce the GitHub Actions `test-core` job. -Instead, compile only the OsaurusCore package sources (no linking) to verify your changes: +Fast local checks from the repository root: ```bash -cd /Users/tpae/dev/osaurus/Packages/OsaurusCore && swift build 2>&1 | grep -E "error:" | grep -v "IkigaJSON" +swift test --package-path Packages/OsaurusCore +swift test --package-path Packages/OsaurusCLI --parallel +swift-format lint --strict --recursive Packages App ``` -If the filtered output is empty, your code compiles cleanly. +CI-parity check from the repository root: + +```bash +make ci-test +``` + +If you only need a compile smoke test for core sources, this is acceptable: + +```bash +swift build --package-path Packages/OsaurusCore +``` + +Do not hardcode local absolute paths in docs or scripts. Use repo-root-relative commands unless a tool explicitly requires an absolute path. diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 159f9094e..85c4a5fb7 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -20,7 +20,7 @@ If UI updated, add before/after. ## Checklist -- [ ] I have read `CONTRIBUTING.md` +- [ ] I have read `docs/CONTRIBUTING.md` - [ ] I added/updated tests where reasonable - [ ] I updated docs/README as needed -- [ ] I verified build on macOS with Xcode 16.4+ +- [ ] I verified build on macOS with a Swift 6.2-capable Xcode toolchain diff --git a/README.md b/README.md index 6a19fb33d..a79814b51 100644 --- a/README.md +++ b/README.md @@ -222,7 +222,7 @@ cd osaurus open osaurus.xcworkspace ``` -Build and run the `osaurus` target. Requires Xcode 16+ and macOS 15.5+. +Build and run the `osaurus` target. Requires macOS 15.5+ and a Swift 6.2-capable Xcode toolchain. CI currently pins Xcode 26.4.1. ### Git Hooks (lefthook) @@ -267,7 +267,7 @@ See [CONTRIBUTING.md](docs/CONTRIBUTING.md) for the architecture guide and layer Osaurus is actively developed and we welcome contributions: bug fixes, new plugins, documentation, UI/UX improvements, and testing. -Check out [Good First Issues](https://github.com/osaurus-ai/osaurus/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22), read the [Contributing Guide](CONTRIBUTING.md), or join [Discord](https://discord.gg/osaurus). See [docs/FEATURES.md](docs/FEATURES.md) for the full feature inventory. +Check out [Good First Issues](https://github.com/osaurus-ai/osaurus/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22), read the [Contributing Guide](docs/CONTRIBUTING.md), or join [Discord](https://discord.gg/osaurus). See [docs/FEATURES.md](docs/FEATURES.md) for the full feature inventory and [docs/DEVELOPMENT_PLAN.md](docs/DEVELOPMENT_PLAN.md) for the forward roadmap. ## Community diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md index 6ce7ff42d..70255e839 100644 --- a/docs/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -18,16 +18,16 @@ Requirements: - macOS 15.5+ - Apple Silicon (M1 or newer) -- Xcode 16.4+ +- A Swift 6.2-capable Xcode toolchain. CI currently pins Xcode 26.4.1. Build and run: -1. Open `osaurus.xcworkspace` in Xcode 16.4+ +1. Open `osaurus.xcworkspace` in Xcode with the same Swift toolchain family used by CI 2. Select the `osaurus` target and press Run 3. In the app UI, choose a port (default `1337`), then Start 4. Download a model from the Model Manager to generate text locally -Project layout and API overview are in `README.md`. For a complete feature inventory, see [FEATURES.md](FEATURES.md). +Project layout and API overview are in `README.md`. For a complete feature inventory, see [FEATURES.md](FEATURES.md). For prioritized roadmap work, see [DEVELOPMENT_PLAN.md](DEVELOPMENT_PLAN.md). ## Architecture guide @@ -112,6 +112,7 @@ The core library (`Packages/OsaurusCore/`) follows a layered architecture. Each - Write clear, focused commits; prefer Conventional Commits where practical - Open a pull request early for feedback if helpful - Keep PRs small and focused; describe user-facing changes and test steps +- Use [DEVELOPMENT_PLAN.md](DEVELOPMENT_PLAN.md) to choose priority when a change spans multiple workstreams ### Code style @@ -139,7 +140,34 @@ gitignored and not used by CI. ### Testing - Add or update tests in `Packages/OsaurusCore/Tests/` where reasonable -- Ensure the project builds and tests pass in Xcode before submitting +- Run focused tests for the package you changed before submitting +- Use `make ci-test` when you need local parity with the CI `test-core` job +- Keep model, sandbox, network, and other external-infrastructure tests opt-in through environment variables + +Recommended local checks: + +| Change type | Command | +| ----------- | ------- | +| Formatting | `swift-format lint --strict --recursive Packages App` | +| Core logic | `swift test --package-path Packages/OsaurusCore` | +| CI parity for core tests | `make ci-test` | +| CLI changes | `swift test --package-path Packages/OsaurusCLI --parallel` | +| Plugin repository changes | `swift test --package-path Packages/OsaurusRepository` | +| Behavior/eval tuning | `make evals` or `make evals-report` | +| Shell scripts | `find scripts -name '*.sh' -print0 \| xargs -0 shellcheck --severity=warning` | + +`Packages/OsaurusEvals` is intentionally off the normal CI path because it can burn model tokens and depend on local setup. Add eval cases for behavior that depends on model or provider output, but do not make them unconditional CI gates without an explicit maintainer decision. + +### Definition of done + +A contribution is ready for review when: + +- The change follows the layer rules above +- Tests or evals cover the behavior, or the PR explains why coverage is not reasonable +- Docs, fixtures, and examples are updated for public API, tool, storage, plugin, or file format changes +- Security-sensitive changes include redaction, permission, and user-visible failure-mode thinking +- UI changes include screenshots or recordings when visual behavior changes +- The PR test plan lists the exact local commands or manual checks performed ### Commit and PR guidelines @@ -160,6 +188,7 @@ Good documentation is just as important as good code. Here's how to contribute t | -------------------------------------------------------------- | ----------------------------------------------------------------- | | [README.md](../README.md) | Project overview, quick start, feature highlights | | [FEATURES.md](FEATURES.md) | **Source of truth** — feature inventory and architecture | +| [DEVELOPMENT_PLAN.md](DEVELOPMENT_PLAN.md) | Prioritized roadmap, workstreams, and definition of done | | [REMOTE_PROVIDERS.md](REMOTE_PROVIDERS.md) | Remote provider setup and configuration | | [REMOTE_MCP_PROVIDERS.md](REMOTE_MCP_PROVIDERS.md) | Remote MCP provider setup | | [DEVELOPER_TOOLS.md](DEVELOPER_TOOLS.md) | Insights and Server Explorer guide | diff --git a/docs/DEVELOPER_TOOLS.md b/docs/DEVELOPER_TOOLS.md index 471855869..4726d247b 100644 --- a/docs/DEVELOPER_TOOLS.md +++ b/docs/DEVELOPER_TOOLS.md @@ -255,9 +255,20 @@ The Server Explorer requires the server to be running. If endpoints show as disa How CI runs the Osaurus test suite, and the hooks that exist to debug it when it goes sideways. +### Jobs + +The CI workflow is pinned to the runner and Xcode version declared in [`.github/workflows/ci.yml`](../.github/workflows/ci.yml). + +| Job | Purpose | Current Timeout | +| --- | --- | --- | +| `test-core` | `xcodebuild test` for `OsaurusCoreTests` through `osaurus.xcworkspace` | 45 minutes | +| `test-cli` | `swift test --package-path Packages/OsaurusCLI --parallel` | 10 minutes | +| `swiftlint` | SwiftLint over the repo | 10 minutes | +| `shellcheck` | ShellCheck for scripts | 10 minutes | + ### Reproduce CI locally -The Makefile target `make ci-test` runs the exact `xcodebuild` flags CI uses, piped through `xcbeautify`, and writes a result bundle: +The Makefile target `make ci-test` runs the same core `xcodebuild` path CI uses, pipes output through `xcbeautify`, and writes a result bundle: ```bash brew install xcbeautify # one-time @@ -265,62 +276,58 @@ make ci-test open build/Tests.xcresult # full Xcode Test Navigator UI ``` -If a test fails on CI but you can't reproduce it on your machine, download the `test-core-xcresult-*` artifact attached to the failed CI run and open it the same way. +Use narrower package tests while iterating, then use `make ci-test` before a risky PR or when chasing a CI-only failure. ### Long-running and integration tests -Tests that require external infrastructure (Apple Containerization, real GPU, network, etc.) must: +Tests that require external infrastructure (Apple Containerization, real GPU, network, model downloads, provider credentials, etc.) must: -1. **Be opt-in via an environment variable** — never run unconditionally in CI. -2. **Use Swift Testing's `.disabled(if:)` trait** at the suite level so they're reported as `Disabled` (not silently passing). Pattern: +1. **Be opt-in via an environment variable** - never run unconditionally in CI. +2. **Use Swift Testing's `.disabled(if:)` trait** at the suite level so they are reported as `Disabled` rather than silently passing. Pattern: ```swift private let isEnabled = ProcessInfo.processInfo.environment["OSAURUS_RUN_FOO_TESTS"] == "1" @Suite(.disabled(if: !isEnabled, "Set OSAURUS_RUN_FOO_TESTS=1 to run")) - struct FooIntegrationTests { … } + struct FooIntegrationTests { ... } ``` -3. **Keep individual test bodies under ~250ms of `Task.sleep`** and prefer event-driven waits (continuations, `AsyncStream`) for everything else. +3. **Keep individual test bodies under ~250ms of `Task.sleep`** and prefer event-driven waits such as continuations or `AsyncStream`. Currently env-gated: -| Env var | Suite | Notes | -| ---------------------------------------- | ---------------------------------------------------------------------------------------- | ------------------------------------------------ | -| `OSAURUS_RUN_SANDBOX_INTEGRATION_TESTS=1` | [`SandboxIntegrationTests`](../Packages/OsaurusCore/Tests/Sandbox/SandboxIntegrationTests.swift) | Boots a Linux VM; runs `pip`/`npm`/`go` workloads. | +| Env var | Suite | Notes | +| --- | --- | --- | +| `OSAURUS_RUN_SANDBOX_INTEGRATION_TESTS=1` | [`SandboxIntegrationTests`](../Packages/OsaurusCore/Tests/Sandbox/SandboxIntegrationTests.swift) | Boots a Linux VM and runs package-manager workloads. | ### CI cache controls -The `test-core` job caches `~/Library/Developer/Xcode/DerivedData` keyed on Swift sources, manifests, resources, the pinned Xcode version, and a manual `CACHE_SALT`. Two recovery levers when you suspect a bad cache: +The `test-core` job caches SPM packages and `~/Library/Developer/Xcode/DerivedData`. DerivedData is keyed on Swift sources, manifests, resources, C headers/sources, the pinned Xcode version, and `CACHE_SALT`. + +Two recovery levers exist when you suspect a bad cache: -1. **One-shot cold build**: trigger CI manually via the **Run workflow** button on the [CI workflow](../.github/workflows/ci.yml) page and check `clear_cache`. Skips the restore for that one run. -2. **Permanent bust**: bump `CACHE_SALT` (currently `v1`) at the top of `.github/workflows/ci.yml` to `v2` and merge. Every cache key invalidates immediately. +1. **One-shot cold build**: trigger CI manually via the **Run workflow** button and check `clear_cache`. CI still restores the cache first so the save key is available, then wipes restored DerivedData before building. The SPM source cache is preserved. +2. **Permanent bust**: bump `CACHE_SALT` at the top of `.github/workflows/ci.yml` and merge. Every DerivedData and SPM cache key invalidates immediately. -The cache only **saves** on `main` pushes — PRs read from it but never overwrite, so a half-baked branch can't poison everyone. +DerivedData cache saves only on successful `main` runs. PRs can read caches but cannot overwrite them. ### Where the logs live -The full xcodebuild output is collapsed into expandable groups by `xcbeautify`. On a failure CI also publishes: +The full `xcodebuild` output is grouped by `xcbeautify`. On failure or cancellation CI also publishes: -- A short failure summary (failed tests + assertion messages) at the top of the GitHub Actions run page. -- The raw `Tests.xcresult` bundle as a downloadable artifact (`test-core-xcresult-N`, 7 days retention). +- A GitHub step summary that distinguishes build failure, launch hang, zero-test-result hang, and ordinary failed test cases. +- The raw `Tests.xcresult` bundle as a downloadable artifact named `test-core-xcresult-N`, retained for 7 days. -A passing run produces ~1–2k log lines instead of the historical ~30k, and individual tests that hang are killed in ~2 min by `-test-timeouts-enabled YES` (default 60s, max 120s per test). The whole `test-core` job is also capped at 15 minutes via `timeout-minutes`. +Per-test timeouts are enabled with a 60-second default allowance and 120-second maximum allowance. This surfaces hung test names before the job wall-timeout whenever the test bundle launches far enough to report them. ### Deferred follow-up -Test wall-time is now bounded by the build-from-scratch cost of the full `OsaurusCore` package. The biggest remaining lever is splitting `OsaurusCore` into focused SPM targets (`OsaurusFoundation`, `OsaurusInference`, `OsaurusVoice`, `OsaurusUpdater`, `OsaurusSandbox`, `OsaurusUI`) so a Foundation-only PR doesn't rebuild MLX / FluidAudio / Sparkle / VecturaKit. File-coupling counts that justify the split: +Test wall-time is bounded by the build-from-scratch cost of the full `OsaurusCore` package. The biggest remaining lever is splitting `OsaurusCore` into focused targets so a foundation-only PR does not rebuild MLX, FluidAudio, Sparkle, VecturaKit, Containerization, SQLCipher, and SwiftUI-adjacent code. -- MLX/MLXLLM/MLXVLM/MLXLMCommon/Tokenizers: ~10 files, all in `Services/ModelRuntime*`, `Managers/Model/ModelManager.swift`, `Models/Configuration/VLMDetection.swift`, `Utils/StreamingDeltaProcessor.swift`, `Views/Chat/ChatView.swift`. -- `FluidAudio`: 2 files (`Managers/SpeechService.swift`, `Managers/Model/SpeechModelManager.swift`). -- `Sparkle`: 1 file (`Services/UpdaterService.swift`). -- `AAInfographics`: 1 file (`Views/Chat/NativeChartView.swift`). -- `VecturaKit`: 7 files in `Services/{Memory,Method,Skill,Tool}/*`. -- `Containerization`: 1 file (`Services/Sandbox/SandboxManager.swift`). -- `P256K`, `Highlightr`, `SwiftMath`: 1 file each. +The first split should isolate pure models, schemas, utility code, and low-dependency tests. One known boundary leak to clean before that split: `Models/Configuration/VLMDetection.swift` imports `MLXVLM` from the otherwise pure `Models/` tree. -Yet **64 of 70 test files use `@testable import OsaurusCore`**, so even tiny tests rebuild the heavy graph today. The one boundary leak that needs cleaning before the split: `Models/Configuration/VLMDetection.swift` imports `MLXVLM` from the otherwise-pure `Models/` tree. +See [DEVELOPMENT_PLAN.md](DEVELOPMENT_PLAN.md) for the prioritized architecture workstream. --- diff --git a/docs/DEVELOPMENT_PLAN.md b/docs/DEVELOPMENT_PLAN.md new file mode 100644 index 000000000..9ce290644 --- /dev/null +++ b/docs/DEVELOPMENT_PLAN.md @@ -0,0 +1,229 @@ +# Osaurus Development Plan + +Updated: 2026-04-30 + +This plan turns the current repository state, public documentation, private planning notes, CI workflow, and contribution guidelines into a prioritized development roadmap. It is intentionally practical: work is grouped by risk, sequence, and the tests or documentation needed before it can be called done. + +## North Star + +Osaurus should be the local-first AI harness for macOS: agents, memory, tools, identity, voice, automation, and model access that remain useful across local and cloud providers while keeping user data under user control. + +Near-term development should favor reliability, compatibility, contributor speed, and trustworthy extension points before expanding the feature surface. + +## Current Assessment + +The repo is already feature-rich: + +- Core product: agents, memory, chat sessions, local MLX inference, remote providers, OpenAI/Anthropic/Ollama/Open Responses-compatible endpoints, MCP server/client support, schedules, watchers, voice input, storage encryption, sandbox execution, skills, methods, and plugins. +- Architecture: `OsaurusCore` follows a clear Models / Services / Managers / Views / Networking / Storage / Tools / Identity split, with a large SwiftUI surface and heavy runtime dependencies. +- Test posture: `OsaurusCore` has broad unit and integration coverage, CLI tests run separately, behavior evals live in `Packages/OsaurusEvals`, and CI gates core tests, CLI tests, SwiftLint, and shell script linting. +- Main development pressure: `OsaurusCore` is large and dependency-heavy, so small changes can pay the cost of MLX, FluidAudio, SQLCipher, VecturaKit, Sparkle, Containerization, and UI dependencies. +- Product pressure: the public docs present many features as stable, so the next releases need stronger compatibility suites, fewer edge-case regressions, and clearer completion criteria. +- Private planning pressure: high-fidelity document I/O is valuable, but it should follow shared foundations, fixture-based verification, and render checks rather than landing as a broad one-shot feature. + +## Priority Framework + +Use this order when choosing what to do next: + +| Priority | Meaning | Default Action | +| --- | --- | --- | +| P0 | Blocks safe release or contributor trust | Fix before feature expansion | +| P1 | Improves reliability, compatibility, or development speed | Schedule in the next 1-2 milestones | +| P2 | Expands core product value on proven foundations | Start after P0/P1 risk is bounded | +| P3 | Ecosystem, polish, and growth work | Keep moving, but do not preempt P0/P1 | + +## Phase 0: Documentation And Contributor Contract + +Target: immediate + +Goal: make the repo's written contract match how the repo actually builds, tests, and accepts changes. + +Deliverables: + +- Keep `docs/CONTRIBUTING.md`, `docs/DEVELOPER_TOOLS.md`, the PR template, and private development notes aligned with CI. +- Make `docs/DEVELOPMENT_PLAN.md` the public roadmap and link it from the documentation index. +- Keep private feature plans scoped to implementation details, not competing project direction. +- Add a consistent Definition of Done for code, docs, tests, security, and compatibility changes. +- Maintain a concise local verification matrix for core, CLI, evals, formatting, and env-gated integration suites. + +Acceptance criteria: + +- A new contributor can identify the right build/test command without reading CI YAML first. +- Docs do not reference stale cache salts, stale timeouts, wrong paths, or missing root files. +- PR template checklist matches `docs/CONTRIBUTING.md`. + +## Phase 1: Release Hardening And Compatibility + +Target: weeks 1-4 + +Goal: protect the existing surface area before expanding it. + +P0/P1 work: + +| ID | Priority | Work | Deliverables | Acceptance Criteria | +| --- | --- | --- | --- | --- | +| R1 | P0 | API compatibility guardrail | Scripted streaming/non-streaming checks for OpenAI Chat Completions, Open Responses, Anthropic Messages, Ollama chat, tool calls, and error envelopes | Results are reproducible locally and artifacts land under `results/` or `build/compat/` | +| R2 | P0 | Remote provider request parity | Golden request encoding tests for OpenAI-compatible, Anthropic, Open Responses, Ollama, and custom providers | Provider changes require fixture updates and test approval | +| R3 | P0 | Local runtime cancellation and cache safety | Tests around model lease lifetime, cancelled streams, disk cache restore, reasoning sentinel handling, and local/remote model switches | No known crash class can regress without a focused test failing | +| R4 | P0 | Storage and recovery clarity | Verify encrypted DB migration, plaintext backup, key rotation, vector-index rebuild, and mismatch UX | Storage docs and tests cover recovery and failure cases | +| R5 | P1 | CI stability dashboard | Document recurring CI failure modes and keep artifact summaries actionable | Failed CI runs identify build failure, launch hang, test hang, or assertion failure quickly | +| R6 | P1 | Accessibility enforcement | Add theme contrast warnings and at least one high-contrast preset path | Theme editor surfaces contrast risk before export | + +Recommended sequence: + +1. Stabilize request/response compatibility first, because API behavior is the integration contract. +2. Harden local runtime and storage next, because crashes or unrecoverable data loss are higher risk than UI polish. +3. Add accessibility guardrails before broad theme or onboarding iteration. + +## Phase 2: Developer Velocity And Architecture Split + +Target: weeks 4-8 + +Goal: reduce build/test drag and make ownership boundaries easier to preserve. + +P1 work: + +| ID | Priority | Work | Deliverables | Acceptance Criteria | +| --- | --- | --- | --- | --- | +| A1 | P1 | Split pure foundations | Extract low-dependency models, utilities, schemas, and protocol types into a lightweight package/target | Foundation-only tests do not import MLX, FluidAudio, Sparkle, Containerization, or SwiftUI | +| A2 | P1 | Fix boundary leaks | Move `VLMDetection` or isolate MLX/VLM imports out of otherwise pure model code | Pure targets compile without MLX/VLM products | +| A3 | P1 | Targeted test buckets | Group tests by dependency profile: foundation, networking, storage, inference, UI-adjacent, sandbox | CI can run fast buckets without rebuilding the full heavy graph for every change | +| A4 | P1 | Fixture discipline | Create stable fixture directories for API, storage migration, document parsing, plugins, and evals | New regression tests reuse fixtures instead of inventing ad hoc setup | +| A5 | P1 | Contributor labels and issue templates | Align issue labels with roadmap workstreams and "good first issue" scope | New contributors can find safe starter work without deep architecture context | + +Notes: + +- Keep `OsaurusCore` behavior unchanged during the split; treat this as build-system and dependency-risk reduction first. +- Start with pure code and tests. Do not split UI until the lower-level boundary is stable. + +## Phase 3: Agent Capability Quality + +Target: weeks 6-12 + +Goal: improve agent behavior with measurable evals and tighter tool contracts. + +P1/P2 work: + +| ID | Priority | Work | Deliverables | Acceptance Criteria | +| --- | --- | --- | --- | --- | +| G1 | P1 | Expand OsaurusEvals | Add suites for agent loop, tool calling, skill injection, method recall, and memory retrieval | Each suite has representative cases and machine-readable reports | +| G2 | P1 | Preflight selection tuning | Track selected tools/skills/methods, false positives, missed matches, and token overhead | Changes to preflight behavior can be compared across models | +| G3 | P1 | Tool error taxonomy | Normalize retryable, permission, validation, timeout, and provider errors across built-in, MCP, sandbox, and plugin tools | Agents receive actionable errors; UI shows user-safe summaries | +| G4 | P2 | Method lifecycle | Improve method creation, scoring, review, and retirement flows | Low-quality or stale methods decay without manual cleanup | +| G5 | P2 | Watcher/schedule observability | Add run history details, convergence diagnostics, and failure summaries | Users can explain why automation did or did not run | + +Recommended sequence: + +1. Add eval coverage before changing agent prompts or capability search weights. +2. Improve error envelopes and retries before increasing automation autonomy. +3. Expand watcher/schedule visibility after tool errors are understandable. + +## Phase 4: High-Fidelity File I/O + +Target: weeks 8-16 + +Goal: build reliable import, edit, render, verify, and export workflows for high-value document formats without slowing normal attachment parsing. + +P2 work: + +| ID | Priority | Work | Deliverables | Acceptance Criteria | +| --- | --- | --- | --- | --- | +| F1 | P2 | File I/O foundation | Shared adapter contract, artifact store, document graph, edit plan, fixture layout, render verifier interface | Two toy adapters can import, edit, export, and verify through the same contract | +| F2 | P2 | DOCX adapter MVP | Preserve paragraphs, runs, styles, tables, images, comments, headers/footers where supported | Five fixtures pass import/export/render verification and unsupported constructs are explicit | +| F3 | P2 | XLSX adapter MVP | Preserve sheets, formulas, styles, tables, charts, images, merged cells, and validation metadata where supported | Five fixtures pass recalculation-aware and rendered verification | +| F4 | P2 | PPTX adapter MVP | Preserve slide masters, layouts, shapes, text runs, images, charts, tables, notes, and media refs where supported | Five fixtures pass slide-image verification and package integrity checks | +| F5 | P2 | PDF intake and export | Extract text with coordinates, render pages, support OCR fallback, annotations, page assembly, and redaction-aware export | Generated PDFs have page count, dimensions, text coverage, annotation, and visual-diff checks | +| F6 | P2 | HTML adapter | Preserve DOM, CSS, links, assets, tables, headings, and accessibility attributes | Browser-backed verification checks DOM validity, assets, links, text, and screenshots | +| F7 | P2 | User-facing UI | Add artifact previews, limitations, diff/verification summaries, and export affordances | Users can inspect what changed before accepting an exported file | + +Non-goals for the first File I/O milestone: + +- Lossless editing of arbitrary PDFs as if they were semantic source files. +- Legacy binary Office editing for `.doc`, `.xls`, or `.ppt`. +- Treating Markdown, CSV, source code, or plain text as high-fidelity formats. +- Pixel-perfect replication of every vendor-specific Office rendering quirk. + +## Phase 5: Plugin Ecosystem, Sandbox, And Trust + +Target: ongoing, after Phase 1 guardrails + +P2/P3 work: + +| ID | Priority | Work | Deliverables | Acceptance Criteria | +| --- | --- | --- | --- | --- | +| E1 | P2 | Plugin registry trust path | Improve package signing, verification, rollback, outdated checks, and registry metadata | Installs fail closed on signature or version mismatch | +| E2 | P2 | Plugin developer loop | Tighten `tools create`, `tools dev`, frontend proxy, docs, and generated examples | A new plugin can be created, hot-reloaded, packaged, verified, and documented in one flow | +| E3 | P2 | Sandbox smoke suite | Env-gated sandbox integration tests for provisioning, built-ins, bridge auth, secrets, plugin registration, and artifact integrity | Sandbox changes have a reproducible local test path without running in normal CI | +| E4 | P3 | Marketplace polish | Better plugin discovery, screenshots/docs surfacing, compatibility badges, and good-first plugin examples | Users can choose trustworthy plugins without reading source first | +| E5 | P3 | Remote agent maturity | Pairing UX, tunnel observability, revocation flows, and cross-instance communication design | Remote access remains understandable and revocable | + +## Security And Privacy Backlog + +These items should be pulled forward whenever related code is touched: + +- Encrypt or wrap VecturaKit vector index storage, or document a stronger mitigation if pluggable encryption remains blocked. +- Add threat-model checklists for sandbox bridge routes, remote pairing, plugin HTTP routes, and relay tunnels. +- Keep redaction tests current for access keys, bearer tokens, provider keys, sandbox bridge tokens, and plugin secrets. +- Require explicit user-visible failure modes for unsupported file I/O features, plugin install risks, and storage recovery gaps. +- Audit long-lived plugin databases and WAL checkpoint behavior. + +## Documentation Backlog + +- Add a short architecture decision record template for dependency pins, storage migrations, API compatibility changes, and sandbox security changes. +- Keep `docs/FEATURES.md` as the feature inventory, but use this plan for forward-looking priority. +- Add release checklists that connect docs, compatibility artifacts, appcast generation, acknowledgements, and signing. +- Add "known limitations" sections to major docs that do not currently state them. + +## Definition Of Done + +Code changes are done when: + +- The change follows the layer rules in `docs/CONTRIBUTING.md`. +- Unit or integration tests cover the changed behavior, or the PR explains why tests are not reasonable. +- Public API, tool, storage, plugin, or file format changes update docs and fixtures. +- Security-sensitive changes include redaction, permission, failure-mode, and rollback thinking. +- UI changes include screenshots or recordings and check accessibility basics. +- Local verification commands are listed in the PR test plan. + +Feature milestones are done when: + +- The feature has a clear owner-facing doc page or an explicit entry in an existing doc. +- Unsupported cases fail loudly and usefully. +- Observability exists for common failure modes. +- Evals or compatibility scripts cover behavior that depends on model/provider output. +- Rollback or migration behavior is documented when data or compatibility is affected. + +## Near-Term Sprint Breakdown + +### Sprint 1 + +- Land this development plan and documentation cleanup. +- Fix stale development instructions and broken contribution links. +- Create or refresh API compatibility scripts around the current `results/openai_compat_report.md` workflow. +- Add missing golden tests for recent provider request encoding and tool serialization regressions. + +### Sprint 2 + +- Add Open Responses and Anthropic compatibility fixtures alongside OpenAI Chat Completions. +- Expand `Packages/OsaurusEvals/Suites/Preflight` and create first `AgentLoop` smoke cases. +- Document storage recovery and vector-index limitations in release notes/checklists. + +### Sprint 3 + +- Begin the architecture split with a pure foundation target proposal. +- Move or wrap MLX/VLM imports that leak into pure model code. +- Add target-specific CI or Makefile commands once the first split compiles. + +### Sprint 4 + +- Start File I/O foundation work behind an internal feature flag. +- Build fixture layout and render verifier scaffolding before format-specific features. +- Implement DOCX as the first rich editable adapter only after the shared contract survives fixture tests. + +## Planning Rules + +- Prefer reliability and testability over adding a new surface area. +- Treat docs as part of the product contract. +- Add feature flags for large risky changes, especially inference, storage, sandbox, and File I/O. +- Keep PRs small enough to review against one workstream. +- Update this plan when a milestone completes or when priority changes. diff --git a/docs/FEATURES.md b/docs/FEATURES.md index 1555c0073..cbed2c468 100644 --- a/docs/FEATURES.md +++ b/docs/FEATURES.md @@ -1115,6 +1115,7 @@ Eight settings total, down from v1's 18. The per-section budget knobs, MMR tunin | -------------------------------------------------------------- | ------------------------------------------------- | | [README.md](../README.md) | Project overview, quick start, feature highlights | | [FEATURES.md](FEATURES.md) | Feature inventory and architecture (this file) | +| [DEVELOPMENT_PLAN.md](DEVELOPMENT_PLAN.md) | Prioritized roadmap and development plan | | [WATCHERS.md](WATCHERS.md) | Watchers and folder monitoring guide | | [AGENT_LOOP.md](AGENT_LOOP.md) | Agent loop, folder context, and `todo`/`complete`/`clarify` | | [REMOTE_PROVIDERS.md](REMOTE_PROVIDERS.md) | Remote provider setup and configuration | From 3a22d4533466ef7daf6108dd025c5828c71054f6 Mon Sep 17 00:00:00 2001 From: Michael Meding Date: Wed, 29 Apr 2026 19:41:28 -0300 Subject: [PATCH 2/5] Harden PR test-core DerivedData cache handling --- .github/workflows/ci.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index f7438d4d5..cf3425e4d 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -26,7 +26,7 @@ permissions: env: # Bump to invalidate every cache entry without source surgery (e.g., after a # known-bad cache or an Xcode toolchain upgrade we want to flush manually). - CACHE_SALT: v2-vmlx-5b84387 + CACHE_SALT: v3-pr-cold-deriveddata # Pin Xcode so cache keys are stable across runner image bumps. When you # need to upgrade, change here AND in setup-xcode below. XCODE_VERSION: "26.4.1" From 78a0a0658e403b1edff6e8145c6a220656111c03 Mon Sep 17 00:00:00 2001 From: Michael Meding Date: Wed, 29 Apr 2026 20:06:13 -0300 Subject: [PATCH 3/5] Enforce clean PR CI readiness --- .cursor/rules/pr-clean-ci.mdc | 39 +++++++++++ .github/pull_request_template.md | 8 +++ .github/workflows/ci.yml | 41 ++++++++++++ docs/CONTRIBUTING.md | 21 ++++++ scripts/ci/check-pr-clean.sh | 107 +++++++++++++++++++++++++++++++ 5 files changed, 216 insertions(+) create mode 100644 .cursor/rules/pr-clean-ci.mdc create mode 100755 scripts/ci/check-pr-clean.sh diff --git a/.cursor/rules/pr-clean-ci.mdc b/.cursor/rules/pr-clean-ci.mdc new file mode 100644 index 000000000..bde6fe585 --- /dev/null +++ b/.cursor/rules/pr-clean-ci.mdc @@ -0,0 +1,39 @@ +--- +description: PR readiness requires attached, green GitHub checks +globs: "**/*" +alwaysApply: true +--- + +# Clean PR Rule + +Do not create, mark ready, or recommend merging an Osaurus PR unless GitHub +checks are attached and green. + +Before opening or updating a PR: + +- Work from the clean checkout at `/Users/mmeding/Documents/Claude/Projects/osaurus-exec`. +- Keep `/Users/mmeding/Documents/Claude/Projects/osaurus` read-only. +- Run the smallest useful local verification for the files touched. +- Push only after the working tree is clean. + +After opening or updating a PR: + +- Confirm GitHub Actions attached checks to the PR. +- Wait for the required checks to finish: `test-core`, `test-cli`, + `swiftlint`, `shellcheck`, and `pr-clean-gate`. +- Run `scripts/ci/check-pr-clean.sh osaurus-ai/osaurus `. +- Keep the PR draft or blocked if checks are missing, pending, cancelled, or + failing. + +If GitHub shows zero checks: + +- Do not treat local tests as sufficient for merge. +- Rebase, push, or close/reopen the PR to trigger Actions if you own the branch. +- If the branch belongs to an external fork and this account cannot update it, + leave a PR comment and require the author or a maintainer with branch access + to trigger CI. + +If a shared CI failure blocks many PRs: + +- Fix the shared CI problem first in a dedicated PR. +- Do not debug unrelated feature code until the shared blocker is green. diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 85c4a5fb7..4b0f42c03 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -14,6 +14,13 @@ Explain the motivation and the changes. Link issues (e.g., Closes #123). Steps to verify locally (commands, screenshots, recordings). Include model used. +Required before marking ready: + +- [ ] Local targeted verification passed for the files touched +- [ ] GitHub checks are attached to this PR +- [ ] `test-core`, `test-cli`, `swiftlint`, `shellcheck`, and `pr-clean-gate` are green +- [ ] I ran `scripts/ci/check-pr-clean.sh osaurus-ai/osaurus ` + ## Screenshots If UI updated, add before/after. @@ -24,3 +31,4 @@ If UI updated, add before/after. - [ ] I added/updated tests where reasonable - [ ] I updated docs/README as needed - [ ] I verified build on macOS with a Swift 6.2-capable Xcode toolchain +- [ ] This PR is draft/blocked if any GitHub check is missing, pending, cancelled, or failing diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index cf3425e4d..d599aa783 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -381,3 +381,44 @@ jobs: - name: Lint shell scripts run: find scripts -name '*.sh' -print0 | xargs -0 shellcheck --severity=warning + + pr-clean-gate: + name: pr-clean-gate + runs-on: ubuntu-latest + timeout-minutes: 5 + needs: + - test-core + - test-cli + - swiftlint + - shellcheck + if: ${{ always() }} + steps: + - name: Require all CI jobs to pass + env: + TEST_CORE: ${{ needs.test-core.result }} + TEST_CLI: ${{ needs.test-cli.result }} + SWIFTLINT: ${{ needs.swiftlint.result }} + SHELLCHECK: ${{ needs.shellcheck.result }} + run: | + { + echo "## PR clean gate" + echo + echo "| Job | Result |" + echo "| --- | --- |" + echo "| test-core | ${TEST_CORE} |" + echo "| test-cli | ${TEST_CLI} |" + echo "| swiftlint | ${SWIFTLINT} |" + echo "| shellcheck | ${SHELLCHECK} |" + } >> "$GITHUB_STEP_SUMMARY" + + failed=0 + for result in "$TEST_CORE" "$TEST_CLI" "$SWIFTLINT" "$SHELLCHECK"; do + if [ "$result" != "success" ]; then + failed=1 + fi + done + + if [ "$failed" -ne 0 ]; then + echo "::error title=CI is not clean::Every required CI job must finish with result=success before a PR is ready to merge." + exit 1 + fi diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md index 70255e839..1d51d2a85 100644 --- a/docs/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -114,6 +114,27 @@ The core library (`Packages/OsaurusCore/`) follows a layered architecture. Each - Keep PRs small and focused; describe user-facing changes and test steps - Use [DEVELOPMENT_PLAN.md](DEVELOPMENT_PLAN.md) to choose priority when a change spans multiple workstreams +### Clean PR rule + +A PR is not ready to merge until GitHub Actions are attached and green. Local +verification is required, but it is not a replacement for repository CI. + +Before asking for review or merge: + +1. Run the smallest useful local verification for the files touched. +2. Push the branch and confirm GitHub Actions attached checks to the PR. +3. Wait for `test-core`, `test-cli`, `swiftlint`, `shellcheck`, and + `pr-clean-gate` to finish. +4. Run: + + ```bash + scripts/ci/check-pr-clean.sh osaurus-ai/osaurus + ``` + +Keep the PR as draft or explicitly blocked if any check is missing, pending, +cancelled, or failing. A PR with zero attached checks is unverified; rebase, +push, or close/reopen it so Actions run before review continues. + ### Code style - Follow standard Swift naming and clarity guidelines diff --git a/scripts/ci/check-pr-clean.sh b/scripts/ci/check-pr-clean.sh new file mode 100755 index 000000000..101e8d3eb --- /dev/null +++ b/scripts/ci/check-pr-clean.sh @@ -0,0 +1,107 @@ +#!/usr/bin/env bash +set -euo pipefail + +usage() { + cat <<'USAGE' +Usage: scripts/ci/check-pr-clean.sh [repo] [pr] + +Verifies that a GitHub pull request has attached checks and that every check is +green. Run this before marking a PR ready or asking for merge. + +Arguments: + repo GitHub repository, default: osaurus-ai/osaurus + pr Pull request number or URL. If omitted, gh resolves the current branch. + +Environment: + PR_CLEAN_REQUIRED_CHECKS + Space-separated required check names. Defaults to: + "test-core test-cli swiftlint shellcheck pr-clean-gate" +USAGE +} + +if [ "${1:-}" = "-h" ] || [ "${1:-}" = "--help" ]; then + usage + exit 0 +fi + +repo="${1:-osaurus-ai/osaurus}" +pr="${2:-}" + +if ! command -v gh >/dev/null 2>&1; then + echo "error: GitHub CLI (gh) is required" >&2 + exit 2 +fi + +if ! command -v jq >/dev/null 2>&1; then + echo "error: jq is required" >&2 + exit 2 +fi + +if [ -z "$pr" ]; then + pr="$(gh pr view --repo "$repo" --json number --jq .number)" +fi + +checks_error="$(mktemp)" +trap 'rm -f "$checks_error"' EXIT + +if ! checks_json="$(gh pr checks "$pr" \ + --repo "$repo" \ + --json name,state,bucket,workflow,link \ + 2>"$checks_error")"; then + if grep -qi "no checks reported" "$checks_error"; then + echo "error: PR $pr has no GitHub checks attached" >&2 + echo "Push/rebase the branch or close/reopen the PR so Actions run before review." >&2 + exit 1 + fi + + cat "$checks_error" >&2 + exit 1 +fi + +check_count="$(jq 'length' <<< "$checks_json")" +if [ "$check_count" -eq 0 ]; then + echo "error: PR $pr has no GitHub checks attached" >&2 + echo "Push/rebase the branch or close/reopen the PR so Actions run before review." >&2 + exit 1 +fi + +required_checks="${PR_CLEAN_REQUIRED_CHECKS:-test-core test-cli swiftlint shellcheck pr-clean-gate}" +failed=0 + +for check in $required_checks; do + match_count="$(jq --arg name "$check" '[.[] | select(.name == $name)] | length' <<< "$checks_json")" + if [ "$match_count" -eq 0 ]; then + echo "error: required check is missing: $check" >&2 + failed=1 + continue + fi + + non_success="$(jq -r --arg name "$check" ' + .[] + | select(.name == $name and .state != "SUCCESS") + | "\(.name): \(.state) \(.link)" + ' <<< "$checks_json")" + + if [ -n "$non_success" ]; then + echo "$non_success" >&2 + failed=1 + fi +done + +non_passing="$(jq -r ' + .[] + | select(.bucket != "pass") + | "\(.name): \(.state) \(.link)" +' <<< "$checks_json")" + +if [ -n "$non_passing" ]; then + echo "error: non-passing checks remain:" >&2 + echo "$non_passing" >&2 + failed=1 +fi + +if [ "$failed" -ne 0 ]; then + exit 1 +fi + +echo "PR $pr is clean: $check_count GitHub checks attached and passing." From b4d3a4ee38439947ecb2a5a2df26be5c4ac5a1c6 Mon Sep 17 00:00:00 2001 From: Michael Meding Date: Thu, 30 Apr 2026 15:11:22 -0300 Subject: [PATCH 4/5] Update FluidAudio locks for development plan branch --- .../project.xcworkspace/xcshareddata/swiftpm/Package.resolved | 4 ++-- osaurus.xcworkspace/xcshareddata/swiftpm/Package.resolved | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/App/osaurus.xcodeproj/project.xcworkspace/xcshareddata/swiftpm/Package.resolved b/App/osaurus.xcodeproj/project.xcworkspace/xcshareddata/swiftpm/Package.resolved index e15eb4632..1c3001700 100644 --- a/App/osaurus.xcodeproj/project.xcworkspace/xcshareddata/swiftpm/Package.resolved +++ b/App/osaurus.xcodeproj/project.xcworkspace/xcshareddata/swiftpm/Package.resolved @@ -42,8 +42,8 @@ "kind" : "remoteSourceControl", "location" : "https://github.com/FluidInference/FluidAudio.git", "state" : { - "revision" : "d302273d49ef4d8914b27f20d342be482e8810f1", - "version" : "0.14.1" + "revision" : "00ea906c2089971bec767c4b4df38686aa7a9f9e", + "version" : "0.14.3" } }, { diff --git a/osaurus.xcworkspace/xcshareddata/swiftpm/Package.resolved b/osaurus.xcworkspace/xcshareddata/swiftpm/Package.resolved index 55fbfe785..0dc4e4b6d 100644 --- a/osaurus.xcworkspace/xcshareddata/swiftpm/Package.resolved +++ b/osaurus.xcworkspace/xcshareddata/swiftpm/Package.resolved @@ -42,8 +42,8 @@ "kind" : "remoteSourceControl", "location" : "https://github.com/FluidInference/FluidAudio.git", "state" : { - "revision" : "d302273d49ef4d8914b27f20d342be482e8810f1", - "version" : "0.14.1" + "revision" : "00ea906c2089971bec767c4b4df38686aa7a9f9e", + "version" : "0.14.3" } }, { From b16df38044de728e3f8a2582334ea8e9a37348cc Mon Sep 17 00:00:00 2001 From: Michael Meding Date: Fri, 1 May 2026 01:10:50 -0300 Subject: [PATCH 5/5] Align branch with current main APIs --- .../project.xcworkspace/xcshareddata/swiftpm/Package.resolved | 4 ++-- osaurus.xcworkspace/xcshareddata/swiftpm/Package.resolved | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/App/osaurus.xcodeproj/project.xcworkspace/xcshareddata/swiftpm/Package.resolved b/App/osaurus.xcodeproj/project.xcworkspace/xcshareddata/swiftpm/Package.resolved index 1c3001700..e15eb4632 100644 --- a/App/osaurus.xcodeproj/project.xcworkspace/xcshareddata/swiftpm/Package.resolved +++ b/App/osaurus.xcodeproj/project.xcworkspace/xcshareddata/swiftpm/Package.resolved @@ -42,8 +42,8 @@ "kind" : "remoteSourceControl", "location" : "https://github.com/FluidInference/FluidAudio.git", "state" : { - "revision" : "00ea906c2089971bec767c4b4df38686aa7a9f9e", - "version" : "0.14.3" + "revision" : "d302273d49ef4d8914b27f20d342be482e8810f1", + "version" : "0.14.1" } }, { diff --git a/osaurus.xcworkspace/xcshareddata/swiftpm/Package.resolved b/osaurus.xcworkspace/xcshareddata/swiftpm/Package.resolved index 0dc4e4b6d..55fbfe785 100644 --- a/osaurus.xcworkspace/xcshareddata/swiftpm/Package.resolved +++ b/osaurus.xcworkspace/xcshareddata/swiftpm/Package.resolved @@ -42,8 +42,8 @@ "kind" : "remoteSourceControl", "location" : "https://github.com/FluidInference/FluidAudio.git", "state" : { - "revision" : "00ea906c2089971bec767c4b4df38686aa7a9f9e", - "version" : "0.14.3" + "revision" : "d302273d49ef4d8914b27f20d342be482e8810f1", + "version" : "0.14.1" } }, {