Skip to content

Org-scoped Builder credentials + private GitHub imports + reconnect hardening#511

Merged
steve8708 merged 1 commit intomainfrom
codex/org-scoped-credentials
May 5, 2026
Merged

Org-scoped Builder credentials + private GitHub imports + reconnect hardening#511
steve8708 merged 1 commit intomainfrom
codex/org-scoped-credentials

Conversation

@steve8708
Copy link
Copy Markdown
Contributor

Summary

Bundled changes from the codex/org-scoped-credentials branch — multiple concurrent agents.

  • Org-scoped Builder credentials. When an owner/admin connects Builder, credentials now write at scope: "org" so the whole org auto-resolves them on next chat call — no per-user re-connect. Plain members and users without an active org keep writing at scope: "user" (safe default that doesn't trample the org-shared connection). Symmetric on disconnect: a member's Disconnect press never tears down the org's shared row. New resolveCredentialWriteScope(email, orgId, role) helper exposes the decision.
  • Private GitHub design-token imports. import-github now reads a saved GITHUB_TOKEN secret server-side via resolveSecret() — never asks the user to paste a PAT into chat. New fetchGitHubJsonResult returns status + message details so the action can distinguish "needs token" vs "token rejected" vs "rate-limited" vs "wrong URL". parseOwnerRepo now handles SSH (git@github.com:org/repo.git) and .git suffixes. Templates/design registers the secret as optional with a fine-grained-token validator.
  • Agent run reconnect hardening. Run-event retention bumped from 30 min to 24 h (configurable via AGENT_RUN_RETENTION_MS) so reconnect investigations have history. SSE reconnect now synthesizes a terminal done/error event when the persisted run status is completed/errored but no terminal event made it into the event log. Adapter records the last auto-continue reason and attempted run IDs, attaches them as details on the eventual connection-error so debugging stalled streams gets concrete signal. BuilderSetupCard now also appears mid-conversation when the API key is missing.
  • Clips storage recovery. When a recording fails because no upload provider is configured, the recording detail and share routes render the StorageSetupCard inline so the user can connect Builder.io / S3 right there.
  • Auth docs canonicalization. docs/auth.md becomes a thin pointer to the canonical user-facing docs and the framework authentication / security skills. Every template's AGENTS.md and security SKILL.md updated to reflect the Better Auth default + the access-helper model.
  • CLI smoke fix. Asserts dispatch resolves from npm in scaffolds (matches the published-package layout we ship today, replacing the older "scaffold copies actions inline" check).
  • Local prep fix. Docs vitest was picking up Nitro-bundled .spec.mjs files from .output/ after a build; excluded that path so pnpm prep works after agent-native build.

Two changesets included:

  • @agent-native/core minor — Builder credential org scope.
  • @agent-native/core patch — GitHub import + run reconnect.

Test plan

  • CI green: lint / typecheck / test / scaffold E2E / guards
  • New tests pass: credential-provider.spec.ts, design-token-utils.spec.ts, import-github.test.ts, run-manager retention + reconnect synthesis tests
  • No CLAUDE.md / guard violations (changeset present, ownable-query guards, no-env-credentials, no-unscoped-credentials, no-localhost-fallback, template-list)
  • Manual sanity: connect Builder as org admin → other org member sees AI chat working without reconnect; private GitHub import without GITHUB_TOKEN returns the new explanatory error

…ports + reconnect hardening

- Builder credentials now write at org scope when an owner/admin connects, so a single OAuth flow lights up AI chat for the entire org. New `resolveCredentialWriteScope` helper; symmetric scope decision on disconnect.
- `import-github` action accepts a saved `GITHUB_TOKEN` secret for private-repo imports; better error messages distinguish "needs token" vs "token rejected" vs "rate-limited"; broader URL parsing (SSH, .git suffix).
- Agent run-event retention bumped from 30min to 24h (configurable via `AGENT_RUN_RETENTION_MS`); reconnect synthesizes terminal `done`/`error` events when the persisted run already ended; `BuilderSetupCard` now appears mid-conversation when the API key is missing.
- Clips: when a recording fails because storage isn't configured, the StorageSetupCard renders inline on the recording detail and share routes.
- Auth docs: replace the obsolete `ACCESS_TOKEN`-only model with Better Auth canonical pointers across docs/auth.md and every template's AGENTS.md + security SKILL.md.
- CLI smoke: assert dispatch resolves from npm in scaffolds (matches the published-package layout).
- Fix pnpm prep: docs vitest was picking up Nitro-bundled `.spec.mjs` files from `.output/` after a build; exclude that path.
@netlify
Copy link
Copy Markdown

netlify Bot commented May 5, 2026

Deploy Preview for agent-native-scheduling ready!

Name Link
🔨 Latest commit d0a9c5e
🔍 Latest deploy log https://app.netlify.com/projects/agent-native-scheduling/deploys/69fa654d002fd20008125c25
😎 Deploy Preview https://deploy-preview-511--agent-native-scheduling.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 5, 2026

Deploy Preview for agent-native-voice ready!

Name Link
🔨 Latest commit d0a9c5e
🔍 Latest deploy log https://app.netlify.com/projects/agent-native-voice/deploys/69fa654dee092700082cc51c
😎 Deploy Preview https://deploy-preview-511--agent-native-voice.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 5, 2026

Deploy Preview for agent-native-meeting-notes ready!

Name Link
🔨 Latest commit d0a9c5e
🔍 Latest deploy log https://app.netlify.com/projects/agent-native-meeting-notes/deploys/69fa654dc09a130007ddc952
😎 Deploy Preview https://deploy-preview-511--agent-native-meeting-notes.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 5, 2026

Deploy Preview for agent-native-design ready!

Name Link
🔨 Latest commit d0a9c5e
🔍 Latest deploy log https://app.netlify.com/projects/agent-native-design/deploys/69fa654d2a2a320008414cbf
😎 Deploy Preview https://deploy-preview-511--agent-native-design.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Contributor

@builder-io-integration builder-io-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builder reviewed your changes and found 6 potential issues 🔴

Review Details

Code Review Summary

PR #511 bundles critical changes to credential scoping, secret resolution, and agent run recovery. The scope spans auth (org-scoped Builder credentials), agent infrastructure (24h run retention + reconnect synthesis), storage (clips failure messages), and design tokens (GitHub imports with private token support). Risk Level: HIGH due to auth, data-scoping, and agent-state changes.

Key Accomplishments ✅

  • Org-scoped credential isolation: Write-side logic correctly gates scope: "org" to owner/admin only; read-side fallback from user→org is sound. Disconnect mirrors connect-side scope decision.
  • GitHub token security: Tokens properly resolved via resolveSecret() (org-scoped if available), never logged or returned to client, passed through fetch headers safely.
  • Agent run recovery: Event retention bumped to 24h with terminal-event synthesis prevents client hangs during reconnect. seq numbering and race-condition checks are correct.
  • Builder credential type safety: Tests cover critical paths; implementation properly validates role/org transitions.

Key Findings 🔴 HIGH, 🟡 MEDIUM

🔴 HIGH: Startup recovery can duplicate completed agent runs

  • When the initial POST request succeeds but completes before /runs/active checks, the adapter falls back to normal startup retry and sends the prompt a second time. Mutating side effects (emails, records, workflows) can execute twice.
  • Root cause: /runs/active only returns runs with status === "running", so completed runs (even with retained events) look like "no active run."
  • Impact: Transient network failures after server accepts request → duplicate mutations.

🔴 HIGH: Unvalidated failureReason in clips routes (2 instances)

  • Database-sourced error messages rendered without validation in recording detail and share routes.
  • Share route is public-facing — unauthenticated visitors can see untrusted content.
  • Current safety: React escapes string children in JSX. But public endpoints should never rely on context-specific escaping; an attacker controlling the recording owner's account could inject misleading content.

🟡 MEDIUM: Org-scoped Builder connect not observable by status poller

  • Connect writes org-scoped rows for owner/admin, but /builder/status resolves credentials without passing orgId, so the read-side never consults the new org scope.
  • Impact: Connect UI polls until configured: true but never sees org-scoped rows, times out after 5 min, users see misleading "Connect Builder" state despite successful write.
  • Fix: Pass orgId into runWithRequestContext() via getOrgContext() in the status handler.

🟡 MEDIUM: Org scope accepted by registerRequiredSecret but not enforced in routes

  • registerRequiredSecret() now accepts scope: "org", but secrets HTTP routes (write/delete) don't enforce org-level access checks for org scope — only for workspace scope.
  • Impact: First template using scope: "org" will allow any org member to overwrite/delete org-shared secrets; org-scoped reads outside an org fall back to legacy workspace behavior.
  • Fix: Treat "org" as a first-class scope in secrets routes with owner/admin authorization gates.

🟡 MEDIUM: GitHub JSON result type doesn't guarantee array at runtime

  • Generic fetchGitHubJsonResult<Array<...>>() casts to array type without enforcing it. Current code is safe (explicit .isArray() check follows), but fragile if pattern is copied without validation.

Browser Testing

🧪 Browser testing: Skipped — PR only modifies backend actions, credentials routes, and clips storage setup UI. No auth flow, login, or core navigation changes visible to end users. Clips storage UI changes are additive (StorageSetupCard overlay) and don't affect existing player/recording paths.


Fix all in Builder

runId = activeRunId;
if (!attemptedRunIds.includes(activeRunId)) {
attemptedRunIds.push(activeRunId);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Startup recovery can duplicate completed agent runs

When the initial POST fails but the server already accepted the request, /runs/active?threadId=... only returns runs with status='running', so completed runs look like "no active run". The adapter then retries the original POST, executing the agent turn twice and duplicating mutations. Recovery should be able to reconnect to completed runs before retrying the POST.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real bug. Fixing on a follow-up branch — getActiveRunForThreadAsync will surface recently-completed/errored SQL runs (within a 10-min reconnect window) so the adapter can replay synthesized done/error events from PR #511's subscribeFromSQL instead of retrying the POST. New tests cover the recently-completed, recently-errored, and stale-window cases.

isFailure &&
/file upload provider|storage provider|connect builder|s3-compatible/i.test(
failureReason,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Unvalidated failureReason from database rendered in JSX

The failureReason string is fetched from the database and rendered directly without validation. React escapes JSX strings, so script injection is blocked, but removing the any cast and validating the reason string (or limiting it to expected patterns) hardens this against downstream misuse.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping. The bot itself notes React JSX-escapes the string so XSS is blocked. failureReason is written by our own finalize-recording action (controlled string set). The 'downstream misuse' hardening it suggests is hypothetical — there's no untrusted input source feeding this field, and a future caller would have to copy this exact pattern verbatim into a non-JSX context for it to matter. Trusting React's documented escaping is the right call here.

explicitFailure &&
/file upload provider|storage provider|connect builder|s3-compatible/i.test(
message,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Unvalidated failureReason rendered to public share page

This is a public-facing route where failureReason is shown to unauthenticated visitors. Validate or sanitize the message before rendering, or simplify to a generic message that doesn't leak technical details. Public endpoints should not rely on context-specific escaping.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping. Same reasoning as the recording-detail comment: React JSX-escapes the string (bot acknowledged), and failureReason originates from finalize-recording.ts — a controlled set of strings written by our own action, not user input. The 'public-facing route' framing is real but doesn't change the surface: there's no untrusted writer, so there's nothing to sanitize. If we ever start writing user-controlled error messages into this column, that's the moment to add validation; right now it would be defensive code with no attack surface.

await writeBuilderCredentials(
session.email,
{ privateKey, publicKey, userId, orgName, orgKind },
{ orgId, role },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Org-scoped Builder credentials never become visible to status poller

Connect writes org-scoped credentials but /builder/status resolves without an orgId, so the read path never sees org-scoped rows. Pass orgId from getOrgContext(event) into runWithRequestContext() so the status handler resolves the new org-scope credentials.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real bug. Fixing on a follow-up branch — /builder/status will read getOrgContext(event) and pass the orgId into runWithRequestContext(), mirroring what the connect/disconnect handlers already do. Without this, an admin's org-shared OAuth result is invisible to every other org member's status poller and the UI shows 'not connected' even though the chat actually resolves the org-scope row.

*/

export type SecretScope = "user" | "workspace";
export type SecretScope = "user" | "workspace" | "org";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Org scope not enforced in secrets HTTP routes

registerRequiredSecret() now accepts scope: "org", but the secrets routes (write/delete) only enforce access checks for workspace scope. Add org-level owner/admin authorization and treat org scope as a first-class scope in HTTP handlers before any template uses it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real concern. Fixing on a follow-up branch — scope: "org" becomes first-class in resolveScopeId (with no solo fallback, since "org" requires an actual org), and canMutateOrgScope enforces owner/admin role on both write and delete in the registered-secret routes. Tests cover admin-allows / member-rejects / no-active-org-rejects / member-delete-rejects. The ad-hoc routes already strictly normalize to user/workspace, so they're unaffected.

Array<{
name: string;
type: string;
size?: number;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 GitHub JSON result type doesn't enforce array at runtime

The generic type fetchGitHubJsonResult<Array<...>>() doesn't guarantee the value is an array at runtime; current code is safe due to the explicit .isArray() check, but prevent the pattern from being copied without proper validation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping. The bot itself notes the runtime is safe today via the explicit Array.isArray() check at the call site (line 95 in import-github.ts), so this is purely a future-pattern concern. The generic fetchGitHubJsonResult<Array<...>>() documents the expected shape — runtime narrowing stays the caller's job, which is the standard TypeScript contract for any fetch().json() wrapper. Adding inline isArray enforcement to the helper would harm callers who legitimately expect an object response. The ergonomic boundary here is correct.

@steve8708 steve8708 merged commit 117d476 into main May 5, 2026
27 of 28 checks passed
@steve8708 steve8708 deleted the codex/org-scoped-credentials branch May 5, 2026 21:56
steve8708 added a commit that referenced this pull request May 5, 2026
… + org-secrets auth (#512)

* fix(core): PR #511 follow-up — reconnect to terminal runs + status orgId + org-scoped secrets auth

Three review-fed fixes against the org-scoped credentials + reconnect-hardening work that just shipped in #511:

- runs/active surfaces recently-completed/errored SQL runs within a 10-minute reconnect window so the adapter can replay synthesized terminal events from the events stream (added in #511) instead of retrying the original POST. Without this, a POST that failed after the server already accepted and finished the run could re-execute the agent turn — double-apply mutations on every disconnect that crossed an isolate boundary.
- /builder/status reads getOrgContext(event) and passes orgId into runWithRequestContext, so the status poller resolves org-shared Builder credentials. Previously an admin's org-scope OAuth result was invisible to every other org member's status poller — chat worked, but UI said "not connected" forever.
- registerRequiredSecret already accepted scope: "org" but the registered-secret HTTP write/delete routes only enforced auth for workspace scope. Adds canMutateOrgScope (owner/admin required, no solo fallback) and treats org as first-class in resolveScopeId. Ad-hoc routes were already restricted to user/workspace and remain unchanged.

New tests cover all three: recently-completed/errored reconnect window + stale-window rejection in run-manager.spec.ts; admin-allows / member-rejects / no-active-org-rejects / member-delete-rejects in routes.spec.ts.

* fix(core): use completed_at for reconnect window + sweep concurrent Sentry CLI hardening

Addresses the 🟡 review comment on #512 (window measured from start time
rather than completion time) and bundles concurrent-agent Sentry CLI/browser
identity work that landed locally during the same session.

Reconnect window fix:
- getRunByThread now selects completed_at and returns it.
- getActiveRunForThreadAsync measures the terminal-run reconnect window
  from completed_at (with fallback to heartbeat_at, then started_at) so
  long-running tasks that complete seconds before reconnect are still
  reachable. Previously a run that started 11 minutes ago and completed
  2 seconds ago would be rejected by the 10-minute window — forcing the
  client to retry the POST and re-execute the agent turn.
- New tests: long-running-then-recently-completed (reachable), legacy rows
  with NULL completed_at (falls back to heartbeat), stale-window rejection
  now keyed off completed_at.

Concurrent Sentry hardening swept in:
- CLI Sentry beforeSend now drops ValidationError exceptions (validateRepoName
  and friends) so user-input rejections don't pollute crash reports.
- New ValidationError class in cli/create.ts replaces the generic Error in
  validateRepoName.
- cli/index.ts attaches BUILDER_USER_ID / BUILDER_PUBLIC_KEY as Sentry
  user.id and tags so we can map CLI crashes back to operators; existing
  user.ip_address scrubbing preserved.
- workspace-dev.ts handles ENOENT and ENOSPC gracefully (TOCTOU race on
  appsDir, inotify watcher exhaustion); other readdir errors get a Sentry
  warning so we learn about new failure modes.
- Browser side: setSentryUser() helper buffers user/orgId set before Sentry
  init; useSession() pushes the resolved user (id/email/username) and orgId
  tag into Sentry on every session resolve.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant