An Osaurus plugin for backgrounded macOS automation. The agent drives any Mac app while the user keeps working in the foreground — cursor never moves, focus never changes, Spaces never follow. Built on the cua-driver recipe (SkyLight SLEventPostToPid, yabai-style focus-without-raise, Chromium primer click) plus snapshot-scoped element ids and cua-style ax/vision/som capture modes.
See SKILL.md for the agent contract and REFERENCE.md for keyboard shortcuts, per-app recipes, and full schemas. See CHANGELOG.md for release notes (latest: v3.0.0 added background-by-default driving and the cua capture modes).
Accessibility permissions are required. In System Settings > Privacy & Security > Accessibility, add the application running this plugin (Osaurus, or your terminal if running from CLI). Screen Recording is required only for mode: "som" / mode: "vision" and take_screenshot.
list_apps OR open_application → list_windows → get_ui_elements (mode='som') → click_element / set_value / type_text → re-observe only on stale/removed
Every snapshot has an id (s7); every element id includes its snapshot (s7-12); failed actions tell you whether to re-observe (stale: true), give up on this element (removed: true), or that you passed the wrong shape entirely (malformed). The last two snapshots are always retained, so an action immediately after a re-observe still resolves correctly.
open_application defaults to background: true — the app is launched (or attached to) without ever being raised. Pass background: false only when the user genuinely needs to look at the target window.
Every action picks the most-backgrounded transport that works:
- AXPress / AXShowMenu / AXValue — first choice for
click_element,set_value,right_click. Fully backgrounded. SLEventPostToPid(SkyLight private framework, loaded viadlopen) — Chromium-trusted, no cursor warp.CGEvent.postToPid— public CoreGraphics; works for almost everything except Chromium web content.- HID tap (
CGEvent.post(.cghidEventTap)) — final fallback. Warps the cursor; only happens for canvas/Blender/Unity-style apps.
Drag is the one exception: most drop receivers key on the global mouse position, so drag always uses the HID tap.
| Tool | Purpose |
|---|---|
list_apps |
All running GUI apps with pid, bundleId, name, active, hidden. Use before open_application if the target is already running. |
list_windows |
Per-pid window list with windowId, title, focused/minimized, bounds. Pass the windowId to take_screenshot to capture exactly one window without raising it. |
get_active_window |
Frontmost window's pid + title (mostly for figuring out where the user is). |
| Tool | Purpose |
|---|---|
open_application |
Launch (or attach to) an app in background mode and return an initial capture. |
get_ui_elements |
Capture by pid. mode: "som" (default) returns AX tree + annotated screenshot + numeric elementIndex. mode: "ax" is the tree only (fastest, no Screen Recording needed). mode: "vision" is the screenshot only. |
find_elements |
Server-side search by text and/or role. |
| Tool | Purpose |
|---|---|
click_element |
Left/right/double click by id. AXPress first → SkyLight per-pid → HID tap. |
set_value |
Replace a field's value instantly via kAXValueAttribute. |
type_text |
Keystroke typing routed per-pid (no focus steal). With id, focuses + clears + types. |
clear_field |
Empty a field (set_value("") then per-pid Cmd+A + delete fallback). |
| Tool | Purpose |
|---|---|
press_key |
Keyboard shortcuts and special keys. With pid, routed per-pid (no foreground steal). |
click |
Coordinate fallback. With pid, routes per-pid. |
scroll |
Direction + amount. With pid, per-pid (no warp). |
drag |
Coordinate-based drag. Always warps the cursor (drag receivers need it). |
| Tool | Purpose |
|---|---|
act_and_observe |
Run an element action AND get a fresh capture (mode-shaped) in one call. |
take_screenshot |
JPEG/PNG capture. windowId captures exactly one window (works on occluded / off-Space windows). annotate: true overlays element ids. |
list_displays |
Multi-monitor info. |
v0.4 removed the on-screen HUD and the global Esc-cancel monitor — backgrounded automations are invisible to the user and there's nothing to interrupt. The session tools remain as a side-effect-free record of the agent's narration, useful for tooling that surfaces a transcript:
| Tool | Purpose |
|---|---|
start_automation_session |
Record a title + total step count. |
update_automation_session |
Update title / narration / step counter. |
end_automation_session |
Reset the record. |
1. list_apps()
→ { apps: [..., { pid: 1234, name: "Safari", bundleId: "com.apple.Safari" }, ...] }
2. open_application({ identifier: "Safari", mode: "som" })
→ { pid: 1234, som: { snapshot: {...}, image: { mimeType: "image/jpeg", data: "..." }, elements: [{ elementIndex: 1, id: "s1-3", role: "textfield", label: "Address" }, ...] } }
3. press_key({ key: "l", modifiers: ["command"], pid: 1234 })
4. type_text({ text: "https://example.com", pid: 1234 })
5. press_key({ key: "return", pid: 1234 })
6. find_elements({ pid: 1234, text: "More information", role: "link" })
→ { snapshotId: 2, elements: [{ id: "s2-3", label: "More information..." }] }
7. click_element({ id: "s2-3" })
→ { success: true, delta: { focusedWindow: "IANA-managed Reserved Domains" } }
- Background dev-loop QA: agent drives the app being tested while the user keeps coding in the foreground.
- Personal-assistant work: send a Message, check a calendar, pull a tracking number out of an email, all without taking the user's screen away.
- Pulling visual context from apps the user isn't looking at (Figma canvases, Preview windows, Notion docs) —
take_screenshotwithwindowIdreads them where they live, no raise needed. - Native macOS apps (Finder, Mail, Notes, System Settings) — full AX action support, fully backgrounded.
- Safari web browsing — web content is in the AX tree.
- Well-built Electron apps — varies by implementation.
- Chromium right-click on web content is coerced to a left-click at the renderer-IPC boundary, even via SkyLight.
click_elementprefersAXShowMenufor AX-addressable targets — the only reliable right-click path. - Canvas apps (Blender GHOST, Unity, games) filter per-pid event routes entirely. The driver auto-falls back to the HID tap for these, which warps the cursor.
- The SkyLight bridge is loaded at runtime via
dlopen. If Apple removes a symbol on a future macOS, the driver gracefully degrades toCGEvent.postToPid(still backgrounded, just rejected by Chromium web content).
Build:
swift build -c releaseTest:
swift testInstall locally:
osaurus manifest extract .build/release/libosaurus-macos-use.dylib
osaurus tools package osaurus.macos-use 3.0.0
osaurus tools install ./osaurus.macos-use-3.0.0.zipA GitHub Actions workflow (.github/workflows/release.yml) builds and releases the plugin when you push a version tag.
git tag v3.0.0
git push origin v3.0.0MIT