Org-scoped Builder credentials + private GitHub imports + reconnect hardening#511
Org-scoped Builder credentials + private GitHub imports + reconnect hardening#511
Conversation
…ports + reconnect hardening - Builder credentials now write at org scope when an owner/admin connects, so a single OAuth flow lights up AI chat for the entire org. New `resolveCredentialWriteScope` helper; symmetric scope decision on disconnect. - `import-github` action accepts a saved `GITHUB_TOKEN` secret for private-repo imports; better error messages distinguish "needs token" vs "token rejected" vs "rate-limited"; broader URL parsing (SSH, .git suffix). - Agent run-event retention bumped from 30min to 24h (configurable via `AGENT_RUN_RETENTION_MS`); reconnect synthesizes terminal `done`/`error` events when the persisted run already ended; `BuilderSetupCard` now appears mid-conversation when the API key is missing. - Clips: when a recording fails because storage isn't configured, the StorageSetupCard renders inline on the recording detail and share routes. - Auth docs: replace the obsolete `ACCESS_TOKEN`-only model with Better Auth canonical pointers across docs/auth.md and every template's AGENTS.md + security SKILL.md. - CLI smoke: assert dispatch resolves from npm in scaffolds (matches the published-package layout). - Fix pnpm prep: docs vitest was picking up Nitro-bundled `.spec.mjs` files from `.output/` after a build; exclude that path.
✅ Deploy Preview for agent-native-scheduling ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for agent-native-voice ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for agent-native-meeting-notes ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for agent-native-design ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Builder reviewed your changes and found 6 potential issues 🔴
Review Details
Code Review Summary
PR #511 bundles critical changes to credential scoping, secret resolution, and agent run recovery. The scope spans auth (org-scoped Builder credentials), agent infrastructure (24h run retention + reconnect synthesis), storage (clips failure messages), and design tokens (GitHub imports with private token support). Risk Level: HIGH due to auth, data-scoping, and agent-state changes.
Key Accomplishments ✅
- Org-scoped credential isolation: Write-side logic correctly gates
scope: "org"to owner/admin only; read-side fallback from user→org is sound. Disconnect mirrors connect-side scope decision. - GitHub token security: Tokens properly resolved via
resolveSecret()(org-scoped if available), never logged or returned to client, passed through fetch headers safely. - Agent run recovery: Event retention bumped to 24h with terminal-event synthesis prevents client hangs during reconnect. seq numbering and race-condition checks are correct.
- Builder credential type safety: Tests cover critical paths; implementation properly validates role/org transitions.
Key Findings 🔴 HIGH, 🟡 MEDIUM
🔴 HIGH: Startup recovery can duplicate completed agent runs
- When the initial POST request succeeds but completes before
/runs/activechecks, the adapter falls back to normal startup retry and sends the prompt a second time. Mutating side effects (emails, records, workflows) can execute twice. - Root cause:
/runs/activeonly returns runs withstatus === "running", so completed runs (even with retained events) look like "no active run." - Impact: Transient network failures after server accepts request → duplicate mutations.
🔴 HIGH: Unvalidated failureReason in clips routes (2 instances)
- Database-sourced error messages rendered without validation in recording detail and share routes.
- Share route is public-facing — unauthenticated visitors can see untrusted content.
- Current safety: React escapes string children in JSX. But public endpoints should never rely on context-specific escaping; an attacker controlling the recording owner's account could inject misleading content.
🟡 MEDIUM: Org-scoped Builder connect not observable by status poller
- Connect writes org-scoped rows for owner/admin, but
/builder/statusresolves credentials without passingorgId, so the read-side never consults the new org scope. - Impact: Connect UI polls until
configured: truebut never sees org-scoped rows, times out after 5 min, users see misleading "Connect Builder" state despite successful write. - Fix: Pass
orgIdintorunWithRequestContext()viagetOrgContext()in the status handler.
🟡 MEDIUM: Org scope accepted by registerRequiredSecret but not enforced in routes
registerRequiredSecret()now acceptsscope: "org", but secrets HTTP routes (write/delete) don't enforce org-level access checks for org scope — only for workspace scope.- Impact: First template using
scope: "org"will allow any org member to overwrite/delete org-shared secrets; org-scoped reads outside an org fall back to legacy workspace behavior. - Fix: Treat
"org"as a first-class scope in secrets routes with owner/admin authorization gates.
🟡 MEDIUM: GitHub JSON result type doesn't guarantee array at runtime
- Generic
fetchGitHubJsonResult<Array<...>>()casts to array type without enforcing it. Current code is safe (explicit.isArray()check follows), but fragile if pattern is copied without validation.
Browser Testing
🧪 Browser testing: Skipped — PR only modifies backend actions, credentials routes, and clips storage setup UI. No auth flow, login, or core navigation changes visible to end users. Clips storage UI changes are additive (StorageSetupCard overlay) and don't affect existing player/recording paths.
| runId = activeRunId; | ||
| if (!attemptedRunIds.includes(activeRunId)) { | ||
| attemptedRunIds.push(activeRunId); | ||
| } |
There was a problem hiding this comment.
🔴 Startup recovery can duplicate completed agent runs
When the initial POST fails but the server already accepted the request, /runs/active?threadId=... only returns runs with status='running', so completed runs look like "no active run". The adapter then retries the original POST, executing the agent turn twice and duplicating mutations. Recovery should be able to reconnect to completed runs before retrying the POST.
There was a problem hiding this comment.
Real bug. Fixing on a follow-up branch — getActiveRunForThreadAsync will surface recently-completed/errored SQL runs (within a 10-min reconnect window) so the adapter can replay synthesized done/error events from PR #511's subscribeFromSQL instead of retrying the POST. New tests cover the recently-completed, recently-errored, and stale-window cases.
| isFailure && | ||
| /file upload provider|storage provider|connect builder|s3-compatible/i.test( | ||
| failureReason, | ||
| ); |
There was a problem hiding this comment.
🔴 Unvalidated failureReason from database rendered in JSX
The failureReason string is fetched from the database and rendered directly without validation. React escapes JSX strings, so script injection is blocked, but removing the any cast and validating the reason string (or limiting it to expected patterns) hardens this against downstream misuse.
There was a problem hiding this comment.
Skipping. The bot itself notes React JSX-escapes the string so XSS is blocked. failureReason is written by our own finalize-recording action (controlled string set). The 'downstream misuse' hardening it suggests is hypothetical — there's no untrusted input source feeding this field, and a future caller would have to copy this exact pattern verbatim into a non-JSX context for it to matter. Trusting React's documented escaping is the right call here.
| explicitFailure && | ||
| /file upload provider|storage provider|connect builder|s3-compatible/i.test( | ||
| message, | ||
| ); |
There was a problem hiding this comment.
🔴 Unvalidated failureReason rendered to public share page
This is a public-facing route where failureReason is shown to unauthenticated visitors. Validate or sanitize the message before rendering, or simplify to a generic message that doesn't leak technical details. Public endpoints should not rely on context-specific escaping.
There was a problem hiding this comment.
Skipping. Same reasoning as the recording-detail comment: React JSX-escapes the string (bot acknowledged), and failureReason originates from finalize-recording.ts — a controlled set of strings written by our own action, not user input. The 'public-facing route' framing is real but doesn't change the surface: there's no untrusted writer, so there's nothing to sanitize. If we ever start writing user-controlled error messages into this column, that's the moment to add validation; right now it would be defensive code with no attack surface.
| await writeBuilderCredentials( | ||
| session.email, | ||
| { privateKey, publicKey, userId, orgName, orgKind }, | ||
| { orgId, role }, |
There was a problem hiding this comment.
🟡 Org-scoped Builder credentials never become visible to status poller
Connect writes org-scoped credentials but /builder/status resolves without an orgId, so the read path never sees org-scoped rows. Pass orgId from getOrgContext(event) into runWithRequestContext() so the status handler resolves the new org-scope credentials.
There was a problem hiding this comment.
Real bug. Fixing on a follow-up branch — /builder/status will read getOrgContext(event) and pass the orgId into runWithRequestContext(), mirroring what the connect/disconnect handlers already do. Without this, an admin's org-shared OAuth result is invisible to every other org member's status poller and the UI shows 'not connected' even though the chat actually resolves the org-scope row.
| */ | ||
|
|
||
| export type SecretScope = "user" | "workspace"; | ||
| export type SecretScope = "user" | "workspace" | "org"; |
There was a problem hiding this comment.
🟡 Org scope not enforced in secrets HTTP routes
registerRequiredSecret() now accepts scope: "org", but the secrets routes (write/delete) only enforce access checks for workspace scope. Add org-level owner/admin authorization and treat org scope as a first-class scope in HTTP handlers before any template uses it.
There was a problem hiding this comment.
Real concern. Fixing on a follow-up branch — scope: "org" becomes first-class in resolveScopeId (with no solo fallback, since "org" requires an actual org), and canMutateOrgScope enforces owner/admin role on both write and delete in the registered-secret routes. Tests cover admin-allows / member-rejects / no-active-org-rejects / member-delete-rejects. The ad-hoc routes already strictly normalize to user/workspace, so they're unaffected.
| Array<{ | ||
| name: string; | ||
| type: string; | ||
| size?: number; |
There was a problem hiding this comment.
🟡 GitHub JSON result type doesn't enforce array at runtime
The generic type fetchGitHubJsonResult<Array<...>>() doesn't guarantee the value is an array at runtime; current code is safe due to the explicit .isArray() check, but prevent the pattern from being copied without proper validation.
There was a problem hiding this comment.
Skipping. The bot itself notes the runtime is safe today via the explicit Array.isArray() check at the call site (line 95 in import-github.ts), so this is purely a future-pattern concern. The generic fetchGitHubJsonResult<Array<...>>() documents the expected shape — runtime narrowing stays the caller's job, which is the standard TypeScript contract for any fetch().json() wrapper. Adding inline isArray enforcement to the helper would harm callers who legitimately expect an object response. The ergonomic boundary here is correct.
… + org-secrets auth (#512) * fix(core): PR #511 follow-up — reconnect to terminal runs + status orgId + org-scoped secrets auth Three review-fed fixes against the org-scoped credentials + reconnect-hardening work that just shipped in #511: - runs/active surfaces recently-completed/errored SQL runs within a 10-minute reconnect window so the adapter can replay synthesized terminal events from the events stream (added in #511) instead of retrying the original POST. Without this, a POST that failed after the server already accepted and finished the run could re-execute the agent turn — double-apply mutations on every disconnect that crossed an isolate boundary. - /builder/status reads getOrgContext(event) and passes orgId into runWithRequestContext, so the status poller resolves org-shared Builder credentials. Previously an admin's org-scope OAuth result was invisible to every other org member's status poller — chat worked, but UI said "not connected" forever. - registerRequiredSecret already accepted scope: "org" but the registered-secret HTTP write/delete routes only enforced auth for workspace scope. Adds canMutateOrgScope (owner/admin required, no solo fallback) and treats org as first-class in resolveScopeId. Ad-hoc routes were already restricted to user/workspace and remain unchanged. New tests cover all three: recently-completed/errored reconnect window + stale-window rejection in run-manager.spec.ts; admin-allows / member-rejects / no-active-org-rejects / member-delete-rejects in routes.spec.ts. * fix(core): use completed_at for reconnect window + sweep concurrent Sentry CLI hardening Addresses the 🟡 review comment on #512 (window measured from start time rather than completion time) and bundles concurrent-agent Sentry CLI/browser identity work that landed locally during the same session. Reconnect window fix: - getRunByThread now selects completed_at and returns it. - getActiveRunForThreadAsync measures the terminal-run reconnect window from completed_at (with fallback to heartbeat_at, then started_at) so long-running tasks that complete seconds before reconnect are still reachable. Previously a run that started 11 minutes ago and completed 2 seconds ago would be rejected by the 10-minute window — forcing the client to retry the POST and re-execute the agent turn. - New tests: long-running-then-recently-completed (reachable), legacy rows with NULL completed_at (falls back to heartbeat), stale-window rejection now keyed off completed_at. Concurrent Sentry hardening swept in: - CLI Sentry beforeSend now drops ValidationError exceptions (validateRepoName and friends) so user-input rejections don't pollute crash reports. - New ValidationError class in cli/create.ts replaces the generic Error in validateRepoName. - cli/index.ts attaches BUILDER_USER_ID / BUILDER_PUBLIC_KEY as Sentry user.id and tags so we can map CLI crashes back to operators; existing user.ip_address scrubbing preserved. - workspace-dev.ts handles ENOENT and ENOSPC gracefully (TOCTOU race on appsDir, inotify watcher exhaustion); other readdir errors get a Sentry warning so we learn about new failure modes. - Browser side: setSentryUser() helper buffers user/orgId set before Sentry init; useSession() pushes the resolved user (id/email/username) and orgId tag into Sentry on every session resolve.
Summary
Bundled changes from the
codex/org-scoped-credentialsbranch — multiple concurrent agents.scope: "org"so the whole org auto-resolves them on next chat call — no per-user re-connect. Plain members and users without an active org keep writing atscope: "user"(safe default that doesn't trample the org-shared connection). Symmetric on disconnect: a member's Disconnect press never tears down the org's shared row. NewresolveCredentialWriteScope(email, orgId, role)helper exposes the decision.import-githubnow reads a savedGITHUB_TOKENsecret server-side viaresolveSecret()— never asks the user to paste a PAT into chat. NewfetchGitHubJsonResultreturns status + message details so the action can distinguish "needs token" vs "token rejected" vs "rate-limited" vs "wrong URL".parseOwnerReponow handles SSH (git@github.com:org/repo.git) and.gitsuffixes. Templates/design registers the secret as optional with a fine-grained-token validator.AGENT_RUN_RETENTION_MS) so reconnect investigations have history. SSE reconnect now synthesizes a terminaldone/errorevent when the persisted run status iscompleted/erroredbut no terminal event made it into the event log. Adapter records the last auto-continue reason and attempted run IDs, attaches them asdetailson the eventual connection-error so debugging stalled streams gets concrete signal.BuilderSetupCardnow also appears mid-conversation when the API key is missing.StorageSetupCardinline so the user can connect Builder.io / S3 right there.docs/auth.mdbecomes a thin pointer to the canonical user-facing docs and the frameworkauthentication/securityskills. Every template'sAGENTS.mdand securitySKILL.mdupdated to reflect the Better Auth default + the access-helper model..spec.mjsfiles from.output/after a build; excluded that path sopnpm prepworks afteragent-native build.Two changesets included:
@agent-native/coreminor — Builder credential org scope.@agent-native/corepatch — GitHub import + run reconnect.Test plan
credential-provider.spec.ts,design-token-utils.spec.ts,import-github.test.ts, run-manager retention + reconnect synthesis testsGITHUB_TOKENreturns the new explanatory error