Teacher app experience — Claude design, easy API-key onboarding, clawed app launcher#6
Open
SirhanMacx wants to merge 124 commits into
Open
Teacher app experience — Claude design, easy API-key onboarding, clawed app launcher#6SirhanMacx wants to merge 124 commits into
SirhanMacx wants to merge 124 commits into
Conversation
… + 'clawed app' launcher - claude-theme.css: warm Anthropic palette (cream canvas, clay accent, serif display) layered over the token-based stylesheet — re-skins the entire web UI with zero layout changes, light + warm-dark modes. - settings: friendly step-by-step 'how to get an API key' help for Anthropic + OpenAI (console links, one-tap CTA, 'saved' state, local-only reassurance). - clawed app: frictionless launcher — binds 127.0.0.1, trusts same-machine requests (no token wall), opens the browser, routes first-timers to key setup. - base.html loads the theme layer; entry router registers the new command. Wraps/stabilizes the existing FastAPI app (clawed/api); no engine rewrite. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces terse, developer-flavored stream lines with plain-English status the flagship UX spec calls for: 'Reading your materials…', 'Mapped your unit…', 'Drafting lesson N…', '✓ Lesson N ready', 'Building the student handout, slides & exit ticket…', '✓ Quality check passed — ready to review!'. Adds the missing materials-'generating' line and a closing quality-check confirmation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New flagship capability — turn any topic into a captioned, narrated video (slides → neural voiceover → Ken-Burns → MP4), free + local, no paid API. The engine behind our review Shorts, now a Claw-ED teacher tool. - compile_video.py (new): ported pipeline — Chrome-headless slide render → edge-tts neural VO (macOS `say` fallback) → ffmpeg zoompan Ken-Burns → concat/mux/fade → MP4 + ffprobe verify. Binaries via shutil.which; missing deps raise a catchable VideoDependencyError (no crash). scenes_from_lesson() maps lesson content → scenes; optional image_resolver for photo backgrounds. - agent_core/tools/generate_video.py (new): GenerateVideoTool (write_local), mirrors generate_animation.py; drafts a script via the LLM, builds to data_dir()/videos, returns a friendly ToolResult with install guidance if ffmpeg/Chrome/TTS are missing. - tts.py: edge-tts neural backend now preferred (free; replaces robotic gTTS / paid OpenAI TTS), graceful fallback to gTTS then `say`. Public API unchanged. - self_equip.py: allow 'edge-tts' for on-demand install. Verified: imports clean; tool auto-discovered (52 tools); smoke build produced a valid 1080x1920 H.264+AAC MP4 with neural voice; graceful dep-missing path confirmed; ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wires the existing improve capability to the teacher UI. After viewing a
generated lesson, a teacher types a natural-language change ('make it shorter',
'add a primary source', 'lower to 9th-grade reading level', 'add Regents-style
questions') and the lesson updates in place — no full regeneration.
- POST /api/improve/{lesson_id}: loads the lesson, applies the change via the
same LLMClient.generate_json pattern generate_lesson uses (persona-aware),
re-validates as DailyLesson (preserves lesson_number), persists via
db.update_lesson_json (stable lesson_id, edit_count++). Auth + rate-limit
like neighbors; fails safe (400/404/502 JSON, stored lesson untouched on error).
- lesson.html: 'Revise in Plain English' card (input + Apply + one-tap chips);
reuses existing classes; hidden on shared view.
Verified: app builds, routes import, endpoint registered, template compiles,
ruff clean. Note: revisions overwrite in place (no undo); materials/scores not
auto-regenerated after a revision.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Scans every student-facing .docx for accidentally-leaked teacher-only content before delivery — enforcing the rule: never put the answer key on the student handout. - quality_render.py (new): scan_student_leakage()/assert_clean()/extract_text() over .docx (paragraphs + table cells) or raw text; word-bounded, case- insensitive LEAKAGE_PATTERNS (answer key, answer:, correct answer, correct: A, key:, teacher copy/version/note, mark scheme, model answer) with false-positive guards (won't flag 'key term', 'answer the question', etc.). - compile_student.py: after the student docx is saved, scan + log WARNINGs + record findings (get_last_leakage_findings); wrapped so it can never block delivery (warn, not block). - validation.py: validate_student_leakage() labeled 'student_answer_key_leakage'. Verified: dirty -> 4 findings, clean -> none; false-positive guards pass; imports OK; ruff clean. Text-only (won't catch answers inside images). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
heartbeat (function-size cap): - compile_video.py: split build_video (231 lines) into _render_slides / _render_voiceover / _render_clips / _assemble_video helpers, each <200. mypy --strict (errors from the failed run): - compile_video.py: assert ffmpeg/ffprobe/chrome non-None after the missing- check so mypy narrows str|None -> str at the helper call sites. - tts.py: type: ignore[import-not-found] on the optional edge_tts availability import (not a dependency; resolved at runtime via the CLI). - quality_render.py: extract_text(source: object) so the defensive non-str/Path fallback is reachable (clears [unreachable]). - generate_video.py: _video_dir() -> Path with a Path import; wrap data_dir() in Path() to avoid no-any-return. Verified locally: ruff clean; 0 functions over 200 lines; mypy --strict clean on the changed files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
generate_json() is typed to return a dict, so the defensive 'if not isinstance(raw, dict)' branch was statically unreachable (mypy [unreachable]). Removed it; lesson_number is still forced stable and schema problems are caught by the DailyLesson.model_validate try below. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e-in-plain-English Front-door update so visitors see what's new: 'clawed app' (browser app, guided key setup, no terminal), 52 tools incl. narrated videos, revise-in-plain-English, and student-safe answer-key scanning. Accurate to the shipped features on this PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After generation, the results step now embeds the finished lesson right there —
the teacher sees it formatted, in Claude design, without leaving the page. The
embed uses /lesson/{id}?embed=1, and base.html hides the nav/footer when
?embed=1 so the preview is focused (verified: navbar present normally, absent
under ?embed). The embedded lesson's own revise-in-plain-English + export
controls work inside the preview. Template-only change.
Completes the flagship loop: type a request -> watch human-language progress ->
see the lesson appear -> revise/export inline.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ding
Settings now offers all five providers as selectable cards: Anthropic,
OpenAI, OpenRouter, Google, and local Ollama. Each shows an inline
"how to get a key" link to the provider's console.
- OpenRouter: free-text model slug input (openrouter_model) so any
model on the gateway works, e.g. anthropic/claude-sonnet-4 or an
open-weights Chinese model.
- Google: model select (google_model), defaults to gemini-2.5-flash.
- SaveSettingsRequest carries openrouter_model / google_model.
- "Bring your own key, or run a local model" framing throughout.
- Capture-phase submit handler in a settings {% block scripts %} so
the new providers save even though app.js only knows the first three.
Keys are entered by the teacher on this page (or via env vars); the app
reads them with env > keyring > config priority. No key is stored by
the build.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Create flow could only make full lessons; the engine can do much
more. Added three artifact types as type-cards on /generate:
- Quiz: POST /api/quiz (AssessmentGenerator.generate_quiz) with an
adjustable question count. Returns JSON, no assessment row needed.
- Differentiate: POST /api/differentiate/{lesson_id} runs
generate_iep_lesson_modifications for a chosen profile (ELL, IEP,
504, gifted, reading-support) and persists the adapted lesson.
- Review Game: POST /api/game (compile_game) writes a playable HTML
game to ~/clawed_output; GET /api/game/file serves it (path-confined).
generate.html: pickGenType now swaps per-type fields (question slider,
game style, lesson picker, profile select); runSimpleGen does a single
POST and renders results inline in #gen-simple-results.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A native macOS MenuBarExtra app so teachers never touch a shell: - ServerController spawns `clawed app --port 8000 --no-open` (falls back to the python _entry_router if the CLI isn't on PATH) and polls /api/health every second for a live status dot. - Start / Stop / Open in browser from the menu bar. - Shows the LAN URL plus a CoreImage QR code so a phone on the same Wi-Fi can open the teacher's own instance — local-first, no cloud. - Settings: launcher path + port only. No command/shell field is exposed (mobile/menu-bar never gets arbitrary shell), per the security model. Builds clean on Swift 6.2 (debug + release). Plan and rationale in docs/product/CLAWED_DESKTOP_PLAN.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hink, capped ctx, JSON mode) The local Ollama path used the raw /api/generate endpoint with a concatenated prompt. Modern instruct/reasoning models (Gemma 4, Qwen, etc.) return an EMPTY completion there because their chat template is never applied — so "run a local model" silently produced nothing. Switch the local path to /api/chat with proper system/user roles so the template is applied, and: - "think": False — skip the visible chain-of-thought. A local 12B answers in seconds instead of spending hundreds of tokens reasoning aloud (~11x faster in testing). Mirrors the existing vision path. - num_ctx cap (default 8192, override via config.ollama_num_ctx) — some models (Gemma 4) ship a 128K default context whose KV cache is slow and memory-hungry on consumer hardware (e.g. a 16GB Mac). - "format": "json" for generate_json() calls — constrains decoding to valid JSON, eliminating parse failures and the validate-then-retry round-trip that roughly doubled wall-clock time for quizzes/lessons. - Strip any stray <think>…</think> preamble defensively. Net effect: bring-your-own-key was already fine; this makes the local-model path genuinely usable, which is the whole "Claude Code for teachers, BYO key or run local" pitch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a Claude-Code-style co-teacher: a teacher types a request in plain
English and watches the answer write itself, instead of waiting on a
spinner.
- llm.py: generate_stream() async generator with native SSE streaming for
OpenRouter and NDJSON streaming for Ollama (other providers fall back to
a single yield). For OpenRouter reasoning models (e.g. minimax-m3) it
sends reasoning={enabled:false} so quick help streams in ~10s instead of
~28s of hidden chain-of-thought.
- routes/generate.py: POST /api/ask/stream -> EventSourceResponse that
streams tokens; uses the teacher's persona for voice when present, works
without ingestion too.
- generate.html: an "Ask your co-teacher" card at the top of Create with a
fetch + ReadableStream SSE reader that renders tokens live.
Verified end-to-end through the browser on OpenRouter/minimax-m3.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…headers) The model returns Markdown; render it live with a tiny XSS-safe Markdown->HTML converter (input is HTML-escaped first) so headers, bold, bullet/numbered lists, blockquotes, and inline code display as formatted text instead of raw asterisks. Streams in formatted, not plain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Finish the co-teacher into a grab-and-go tool: - Quick-start chips (Do Now, Hook, Discussion Qs, Exit ticket, Differentiate) prefill a templated prompt and focus the box so the teacher just adds the topic. - After an answer streams in, Copy and "Download .md" buttons appear so the teacher can drop the result straight into their materials. Frontend-only; verified end-to-end in the browser on OpenRouter/minimax-m3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ion) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
/api/health (polled by the footer status bar) returned only status+version, so app.js always showed "Not connected — check settings" even when a provider was configured and working. Add llm_provider, llm_model, and a cheap llm_connected flag (key present, or local Ollama which needs none) — no outbound LLM call, keeping the poll fast. Footer now shows e.g. "Connected — minimax/minimax-m3". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
HEARTBEAT's exception-swallow check flagged the bare `except Exception: pass` in the new /api/health provider probe. Log at debug instead, and drop the now- redundant inline imports (get_api_key, LLMProvider, AppConfig are already module-level). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ness boxes - README: add the live "Ask your co-teacher" feature and the Mac menu-bar app + phone-access (LAN/QR) to the no-terminal-app paragraph and Features. - TESTFLIGHT_READINESS: tick mobile-responsive, first-run/empty states, connection/health clarity, and README (verified this iteration). Remaining in section A: end-to-end artifact-endpoint verification on minimax-m3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er map Artifact endpoints that route the model (game, differentiation, the unit/lesson pipeline) sent the Ollama default `gemma4:31b-cloud` to OpenRouter — an invalid model ID there — so every routed generation 400'd on OpenRouter/minimax-m3. resolve_model now keeps the teacher's configured model (e.g. openrouter_model) for any provider that has no built-in tier map, instead of falling back to the Ollama-centric defaults. Also: surface OpenRouter's HTTP error body (opaque 500 -> actionable message), and build the game's lesson stub as a lightweight SimpleNamespace (compile_game reads only a few fields; a full MasterContent has 10 required fields incl. a nested DoNow — overkill and brittle). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
From a design-review audit of the app (teacher-first, vs the Claude design system): - a11y/contrast: define the missing --text-muted/--accent/--bg-subtle as theme aliases so inline fallbacks resolve to real tokens (the co-teacher card's muted text was #8a8a8a ≈ 3.0:1, failing AA; now #6B6557 ≈ 5.2:1) and auto-skin in dark mode. - off-palette leaks: disabled primary buttons rendered default blue on the first screen -> muted clay; star ratings #f59e0b -> warm --orange. - touch targets: chips/.btn-sm get min-height 44px on mobile (phone via the Mac app's QR/LAN). - footer: show "Connected" (model slug moved to a tooltip) and a more actionable "Not connected — add a key in Settings". - lesson: Differentiation section no longer renders as an empty heading when the dict exists but has no content. - generate: a labeled divider separates the quick "Ask your co-teacher" card from the full "build a lesson/unit/quiz/game" wizard (was two competing CTAs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add two network-free, server-free test modules: - test_model_router_openrouter.py: guards the fix where model_router fell back to the Ollama default (gemma4:31b-cloud) for OpenRouter's bring-your-own model. Asserts route() and resolve_model() return the configured minimax/minimax-m3 across the deep-tier tasks (game_generate, differentiation, assessment) and all tiers, plus copy-not-mutate and a pin that OpenRouter stays out of PROVIDER_TIER_MODELS. - test_artifact_endpoints.py: graceful-failure contracts for the generate routes via TestClient with local auth bypass + isolated temp data dir: differentiate/<bogus> -> 404 (not 500), invalid quiz body -> 422, game without topic -> 422, plus blank-topic 422s and a stubbed-LLM /api/quiz -> 200 happy path (no real network). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nest Bring docs in line with what actually shipped on this push (verified against the repo): - README/FEATURES: surface live streaming 'Ask your co-teacher' (chips, Markdown, Copy/Download .md), the Mac menu-bar app + phone access (LAN/QR), BYO-key incl. OpenRouter/minimax-m3 + local Ollama, and the six Create artifact types (lesson/unit/materials/quiz/differentiate/review-game). Soften the unverifiable '52 agent tools' count. - CHANGELOG: add an Unreleased (v5.16) dated entry — Claude design layer, provider UI, artifact types, Mac app, streaming co-teacher, local-model fix, model-routing fix, answer-key leakage gate, teacher-first design pass. - ROADMAP: keep released version honest (v5.15; v5.16 in progress), list the built-this-push items, and put the iOS TestFlight client (in progress) + Apple-signing (owner's step) near-term. - TESTFLIGHT_READINESS: tick the no-secrets ship-gate box (verified from .gitignore) and add a verification-notes map for the ticked A-items; leave iOS/Apple-signing unchecked. Verified: all five files render (python-markdown), code fences balanced, internal + external links resolve, only the five owned docs changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds ios-app/ — a calm, local-first Capacitor iOS shell that connects to a teacher's own Claw-ED server over the LAN (the URL/QR the Mac app shows). No Python or curriculum engine ships on the device; the only bundled screen is a dependency-free CONNECT screen that validates + remembers the server URL and hands the WebView over to it. - package.json (clawed-ios; @capacitor/core + ios + cli; sync/open scripts) - capacitor.config.ts (appId app.macxlabs.clawed, appName Claw-ED, webDir www) - www/ CONNECT screen: index.html + connect.js + styles.css, Claude palette (cream #FAF9F5, clay #C96442, serif). URL normalize/validate, localStorage remember + Reconnect path, graceful @capacitor/barcode-scanner QR stub. - resources/icon.png (1024x1024 placeholder, clay square + serif C) + note that the final brand icon set is a follow-up. - README.md (build steps; Apple signing + TestFlight is the owner's step). - docs/product/CLAWED_IOS_PLAN.md (architecture, LAN-only data/security model, toolchain status, Apple-signing handoff). Verified: node --check on connect.js; package.json parses; capacitor.config.ts has the required appId/appName/webDir + default export; 18/18 URL-normalization cases pass (incl. rejecting javascript:/data:/file:/ftp:/mailto: and accepting alphabetic host:port like johns-macbook.local:8000); CONNECT + Reconnect states render cleanly in headless WebKit-class Chrome with zero console errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e, asset cache-bust, mobile lesson, busy spinners Teacher-first design punch-list: - _icons.html: a 14-name inline SVG macro (book/file-text/layers/briefcase/ clipboard-check/users/gamepad/copy/download/link/lock/sparkles/upload/pencil), currentColor line icons that re-skin with the theme. Replaces every OS emoji on generate type-cards + export rows, the onboarding/persona/success icons on index, the export buttons + share affordance on lesson, and the key-storage locks in settings. - lesson: raw '/shared/...' link -> a 'Copy student link' button (copies origin+url, flips to 'Copied!'). - index: first-run hint under the onboarding cards pointing to Settings. - settings: intro line under H1, sticky Save bar, breadcrumb separator -> '/'. - base.html + server.py: asset cache-busting via a Jinja 'asset_v' global (= __version__) appended to style.css / claude-theme.css / app.js. - claude-theme.css mobile: lesson grid -> flex column with the sidebar (Rate/Ask) ordered above the body so it isn't buried on phones. - busy spinners: #ask/#score/#suggest/#revise buttons stay visible + disabled with the existing .spinner while their request is in flight, then restore. Verified: all templates compile + render; ruff clean; mypy --strict adds no new errors; server up on a free port, browse screenshots of /generate /settings / /lesson at 1280 & 390 confirm icons (no emoji), sticky save, asset ?v=, mobile sidebar-first, no overflow, zero console errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y fields generate_iep_lesson_modifications validated the raw LLM dict straight into DailyLesson, but minimax-m3 routinely omits or nulls lesson_number (a required int) and sometimes objective/title. That 500'd /api/differentiate/<id> on every call. Backfill lesson_number/objective/title from the source lesson before model_validate (a differentiated lesson N is still lesson N), mirroring the existing guard in improve_lesson_endpoint. Verified end-to-end on OpenRouter/minimax-m3 (HTTP 200) plus a deterministic null-payload unit test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- desktop/scripts/sign_and_notarize.sh: one-command Developer-ID sign (throwaway /tmp keychain — login keychain never touched), hardened runtime + entitlements, notarytool (existing ASC key) + staple, DMG build/sign/notarize/staple, spctl Gatekeeper verify, delivery to ~/Documents/MacxLabs/web/downloads/Claw-ED.dmg. - desktop/src-tauri/entitlements.plist: jit + unsigned-executable-memory + disable-library-validation for the PyInstaller sidecar. - BLOCKER (Jon-gated): Developer ID Application cert mint is an Account-Holder-only ASC operation — 403 with both team API keys (K5RKF383QT, 7R33522PA5). CSR + private key staged at ~/.appstoreconnect/devid/ so Jon's portal mint takes 2 minutes; the pipeline then runs unattended. Ad-hoc DMG + status note delivered to the downloads dir meanwhile (marked do-not-publish). - HANDOFF.md: M2/M3 status section + cliclick-vs-WKWebView key-event gotcha. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… own files ingest_materials now builds a persisted STYLE PROFILE (voice/tone, lesson-structure pattern with frequencies, assessment conventions, scaffolds, exemplar excerpts) from the ingested corpus: - clawed/style_profile.py — deterministic regex skeleton (sections, question types, point values, MC key-letter balance, answer-key detection, sentence stats) with LLM-filled qualitative fields and heuristic fallbacks; per-file analyses cached by content hash so re-ingest is incremental; multi-profile with one active pointer and an explicit off-switch (default MacxLabs style). - ingest_materials: home-bounded + secrets-denied like mac_files, ONE approval for the whole folder tree (per_params signature), 25MB per-file cap with skip counts, optional profile_name. - get_style_profile / set_active_profile tools (multi-course support). - Consumption: LLMClient._enrich_system_prompt injects the active profile into EVERY generation (lessons, assessments, sub packets); the agent system prompt carries it too (build_system_prompt). - 21 new tests: heuristics, profiling structure, cache incrementality, persistence/active-pointer, prompt injection, tool policy. Privacy: profiles + caches live in the agent's data dir on this Mac; only short excerpts go to the teacher's configured LLM during analysis. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
POST /api/style/ingest (background job, polled via GET /api/style/ingest/status), GET /api/style/profiles, activate/deactivate/delete. Same home/secrets policy as the tool; picking the folder in the app is the approval. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…profile cards - New left-rail view: native folder picker (osascript choose folder via a pick_folder Tauri command — no plugin crate), typed-path fallback, profile-name field, live ingest progress (files processed / skipped), honest privacy copy (local-only; only excerpts go to the configured LLM). - STYLE PROFILE CARD: voice description, lesson-flow diagram (section chips with frequencies), assessment/answer-key/MC-balance/scaffold facts, exemplar quote, Use / Re-ingest / Remove, plus a 'default MacxLabs style' off-switch. - Composer hint row shows the active profile chip (click → manage); empty chat gets a 'Teach Claw-ED your materials' onboarding chip; ⌘K palette entry. Console-theme variants included. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ist, no lock-on-sleep - openssl pkcs12 -export -legacy: OpenSSL 3.x default (AES/SHA256 MAC) fails macOS 'security import' with 'MAC verification failed' - temp keychain must join the user search list: codesign ignores --keychain for identity lookup (saved + restored via trap; login keychain untouched) - drop lock-on-sleep, re-unlock before DMG signing: a sleep during the notarytool wait locked the keychain (errSecInternalComponent at step 5) Verified end-to-end: app + DMG both notarized (Accepted) and stapled; spctl: 'accepted — source=Notarized Developer ID'. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Checkpoint the in-flight desktop-agent batch (lint-clean, tests green): - brain.py: chat access to the durable teaching brain (stats/search/dream) so the desktop + iOS harness drive the same long-term memory as the CLI. - portfolio.py: build advertising-safe sample portfolios from cleared materials, approval-gated, written to the local workspace. - agent_stream.py: authoritative per-turn approval footer (anti-spoof — the SSE layer records what was actually approved/denied so the model can't misreport it) + /agent/tools registry for the Skills gallery. - Trust/rollout docs: TRUST_AND_SECURITY, DISTRICT_ROLLOUT, PUBLIC_POSITIONING, EDUCATOR_SKILL_ROADMAP, AGENT_HARNESS. - iOS app-store screenshot tooling + script/build_and_run.sh desktop helper. - Tests: brain tools, approval footer, style profiles, tool discipline. 76 relevant tests green; ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A turn over the public tunnel (Cf-Ray) is now held to a stricter gate than a local turn on the Mac, so a leaked device token can't yield blanket shell/file access: - Remote turns NEVER honor a standing "Always allow" — every risky action is confirmed fresh on the device (ToolRegistry._check_approval skips the standing lookup when context.is_remote). - A remote "Always" can't CREATE a standing grant — downgraded to one-time in the live path (context.is_remote) and at the resolve endpoint (the resolving request's own Cf-Ray); the resolved event/footer report the effective policy, not the raw request. - CLAWED_AUTO_APPROVE (a local convenience) never applies to a remote turn. is_remote is derived from deps._via_cloudflare_edge and threaded route -> Gateway.handle -> _agent_loop -> AgentContext -> the gate. Pure tightening: only adds confirmation friction on the remote path, never removes it; local Mac behavior is unchanged. Verified: 3 new unit tests for the gate logic + an end-to-end trace confirming is_remote=True propagates to the gate over the real HTTP remote path. Documented in docs/product/TRUST_AND_SECURITY.md. 63 related tests green; ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The general file tools could read/write known credential stores that weren't on the denylist (~/.aws, ~/.kube, ~/.docker, ~/.git-credentials, gcloud, ~/.azure, ~/.pgpass, …) — a silent exfiltration path into the model. Expand _DENY_NAMES to cover cloud/container/VCS/DB credential stores (matched as any path segment, so the whole store is off-limits, not just individual files). Tests: parametrized denial across 9 credential paths for read+write, plus home-escape coverage (../ traversal and absolute /etc paths refused). Documented in TRUST_AND_SECURITY.md. 24 desktop-agent tests green; ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
iOS remote parity for the new agent backend, hardened by a multi-lens pre-ship review (0 blockers; these are the confirmed majors): - Security: origin-PIN the device token — release/POST it ONLY to the exact host it was paired with (tokenForOrigin), so a deep link/QR that re-points the app can never hand the real token to another origin. Confirm before a scanned code connects to a NEW host. Honest token-storage comments. - Correctness: render the streamed command_output events (were silently dropped on iOS — the Mac UI showed them); resolveApproval now inspects the response and stops treating a 200-with-ok:false as success. - Layout: the remote feed shrinks on short phones (iPhone SE) with dvh sizing so the composer + dashboard button never start below the fold. - App Review: remove the declared-but-unused NSCamera/NSPhotoLibrary usage strings (no code uses them; QR scan is an honest stub). Restore pinch-zoom. - CURRENT_PROJECT_VERSION 4 -> 5. Verified: builds for iPhone 17 Simulator (real WebKit), renders clean + auto- connects to the live Mac agent over the tunnel; backend SSE+approval contract proven end-to-end separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ild 7) The Jun-10 session left ASC at build 5 while the git project file read 4, so the build number had to clear 5. Note: this project's archive emits a built CFBundleVersion of CURRENT_PROJECT_VERSION + 1 (Xcode 26 / Pods phase), so project=6 ships as ASC build 7. Always read the built IPA's CFBundleVersion before upload; both today's uploads (v6, v7) are VALID with the M4 fixes, v7 latest. Routes to the all-builds "Internal" group automatically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… trap Device test (build 7) hit "Load failed → blank white screen." The agent logs showed the cause: pairing + the remote control WORKED (health 200, a task streamed POST /api/gateway/chat/stream 200), but tapping "Open full Mac dashboard" did POST /api/auth/bootstrap 303 → GET / 200 → navigated the WebView to the DEPRECATED education web app, which loads but renders blank on the phone. That old form-first web app is exactly what the rebuild replaces. Remove the button + its handler + the now-dead navigateToServer/bootstrap navigation + dead .full-dashboard CSS, so the app never leaves its own UI. The remote-control surface (quick actions + task box + streamed output + approvals) IS the iOS experience. Token is now Bearer-only (origin-pinned); no cookie path. CURRENT_PROJECT_VERSION 6 -> 7 (ships as TestFlight build 8). Verified: builds for iPhone 17 Simulator (real WebKit), connect screen renders clean; no remaining top-level navigation away from index.html. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the quick-actions "remote control" with a real agent chat that matches the Mac desktop Calm Studio (REBUILD_DIRECTION "build once"): - Full-screen chat: clay user bubbles + serif "voice" agent replies. - Tool ACTION cards with a live spinner that resolves to a done/failed tick; streamed run_command output in a dark monospace block. - Inline APPROVAL cards (Allow once / Always / Deny) that resolve in place with the truthful effective policy. - ARTIFACT cards for produced files (lesson DOCX/PDF/deck) — name + path, informational on the phone (the file lives on the Mac). - Empty state with a serif greeting + suggestion chips; a composer with auto-grow textarea + round send; new-conversation + switch-Mac controls. - Tethering / SSE / approval LOGIC unchanged (it works); this is presentation. CURRENT_PROJECT_VERSION 7 -> 8 (ships as TestFlight build 9). Verified: builds for iPhone 17 Simulator (real WebKit); connect screen + the fully-populated chat (bubbles, action cards, command output, approval, artifact, composer) render clean and match the desktop. Throwaway demo affordance removed before ship. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Device feedback on build 9: "Load failed" + "logo is off". Behind the scenes
the Mac WAS being driven (agent logs: task POSTs + approval resolves all 200;
re-confirmed with a tunnel task that created a real file on the Desktop).
- "Load failed" was the raw WebKit fetch-rejection message (Safari's wording
for a dropped/closed SSE stream over the Cloudflare tunnel, e.g. a long task)
surfaced verbatim by the error handler. Now: track a terminal event
(final/error/done) per turn and ignore a stream close that rejects AFTER the
result arrived; map raw "Load failed"/"Failed to fetch" to a human message
("Lost the connection to your Mac… the work may still be finishing").
- Logo: replace the placeholder serif "c" with a real Claw-ED mark — a bold
cream "C" with claw-tipped (talon) terminals on the warm clay Calm Studio
gradient. New app icon (1024² opaque) + in-app header/connect marks (logo.png).
CURRENT_PROJECT_VERSION 8 -> 9 (ships as TestFlight build 10). Verified: builds
for iPhone 17 Simulator; connect screen renders with the new logo.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
/loop until perfect, verified each pass on the iPhone 17 Simulator driving the LIVE agent over loopback (real tasks, real approvals, real Mac data): - Strip the technical "Authoritative approval log for this turn:" footer the server appends — the chat already shows each approval's status on its card. - Don't auto-focus the composer after a turn (it popped the keyboard over the agent's reply on a phone). - Render agent replies as safe markdown — **bold**, `code`, and bullet lists (mixed intro-paragraph + bullets handled), built from text/element nodes only (never innerHTML; no XSS surface). - Verified every surface in real WebKit: connect, empty greeting + chips, populated chat (bubbles, serif voice w/ markdown, action cards spinner→done, streamed command output, approval card Allow/Always/Deny, artifact card), and a full real task→approval→completion against the live agent. CURRENT_PROJECT_VERSION 9 -> 10 (ships as TestFlight build 11). The temporary Simulator verify harness was removed before this ship; harness-removed build re-verified to load. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Loop polish (Simulator-verified): - Blur the composer on Send so the keyboard dismisses and the streaming reply isn't hidden behind it (a real phone P1). - Lock iPhone to portrait (UISupportedInterfaceOrientations = Portrait); iPad keeps landscape. A quick "drive my Mac" remote belongs in portrait, and it removes a class of landscape layout edge cases. CURRENT_PROJECT_VERSION 10 -> 11 (ships as TestFlight build 12). Harness-free build re-verified to load on the iPhone 17 Simulator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Final loop pass: a warm-dark theme matching the desktop Console aesthetic, applied via prefers-color-scheme. The fix that made the Capacitor WKWebView honor the system theme was the canonical `color-scheme: light dark` property on :root (the <meta> alone wasn't enough) + a dark <meta name=theme-color>. Dark overrides for the cream/paper/ink/line tokens + the hardcoded light colors (.voice/.empty-title text, .approve, .note.error, .artifact icon). Verified in the iPhone 17 Simulator in BOTH appearances: connect screen and the full chat (clay bubbles, serif voice w/ markdown bold/code/bullets, action cards, dark command-output, approval card, artifact) render correctly light AND dark. Light re-verified for no regression after the harness was removed. CURRENT_PROJECT_VERSION 11 -> 12 (ships as TestFlight build 13). Verify harness removed before ship. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nects Root cause of a lost lesson: a teacher asked the phone to build an Age of Exploration bundle; the agent paused at the write approval; the phone's tunnel stream dropped; the SSE route's `finally: task.cancel()` then KILLED the in-flight turn — so when the approval was resolved the work was already dead and the generated bundle was never written to the Mac. Fix: on client disconnect, detach the turn to a background set (strong ref so the loop won't GC it) and let it run to completion instead of cancelling. The output still lands on the Mac, and because the approval broker is process-wide, a reconnecting client can still resolve a pending approval and the turn continues — the foundation for reconnect-and-resume. Verified live: started a turn, hard-closed the SSE stream at the approval (simulating the phone dropping), resolved via the separate endpoint — the task survived and wrote its file. Regression test in tests/test_agent_stream_disconnect.py. ruff clean; agent restarted to load it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A teacher's phone drops the SSE stream easily over the tunnel on a long task.
The server already keeps the turn running in the background (it no longer
cancels on disconnect) and records its outcome in _LATEST_TURN. This wires the
iOS client to re-attach on (re)connect.
Server:
- GET /api/agent/latest-turn already returned the recorded turn + pending
approvals; the SSE `start` event now carries its turn_id so a client that
watched a turn to the end can recognize it and not re-render it later.
iOS (connect.js):
- showRemote() now calls resumeLatestTurn() after restoring the greeting:
- status=running -> "still working on …" note, disable composer, poll every
3s (capped) until the result lands, then render it and re-enable the composer
- finished + unseen -> render the task, reply, and artifacts with a
"here's what finished while you were away" note
- pending approvals -> re-attach resolvable Allow/Always/Deny cards (the
approval broker is process-wide, so they resolve on a fresh stream); on the
reconnect path the resolve POST sets an honest resolved line locally since
no live approval_resolved SSE event arrives
- lastSeenTurnId (localStorage) guards against re-showing a turn already seen
live; the live stream records it on final/done
- a freshly sent task supersedes any resume poll; switch-Mac / new-conversation
/ forget stop it
Verified end-to-end in the real iOS Simulator WebKit (iPhone 17, iOS 26.2):
unseen-finished renders, already-seen stays silent, still-working shows the
approval card + running note, and the poll transitions to the rendered result
with artifact cards and a re-enabled composer.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Build 14 ships the iOS reconnect & resume feature. The committed project version had lagged at 12 (prior builds passed the number as an archive-time override); set it to 14 so CFBundleVersion matches the uploaded build and the source is the source of truth again. Verified the built IPA reported CFBundleVersion=14 and contained the resume code before upload. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adversarial audit (8 surfaces, skeptic-verified) + live demo runs surfaced a set
of issues that would break or embarrass a live demo. All fixed and tested
(2228 passing, +9 new regression tests).
Reliability (found by driving the live agent):
- llm.py: the agent's LLM calls used timeout=7200.0 (2 hours) — a stalled
provider call wedged a whole lesson build with the agent idle. Bounded to a
240s read timeout (4 sites) so a stall fails in minutes, handled by the loop's
friendly error path.
- browser.py: _fetch_page_text's Playwright chromium.launch() had NO timeout, so
a missing/mismatched browser build blocked the coroutine — and the turn —
forever (observed: a build wedged at 0.1% CPU). Now bounded by asyncio.wait_for
with an httpx fallback; web research can never hang a build.
- models.py: default openrouter_model → anthropic/claude-sonnet-4.6 (the prior
weak model stalled on clarifying questions instead of building).
- prompt.py: explicit "deliverable rule" — a build/create/make request must end
with the artifact produced; a weak materials search is never grounds to stop
or ask "shall I proceed?".
Security:
- mac_files.py: the credential denylist was case-SENSITIVE, but macOS's default
filesystem is case-insensitive — read_file('~/.ENV') / write_file('~/.AWS/…')
walked right past it. Now matched case-insensitively at all three sites.
Crash-hardening (clawed/agent.py, the live OpenRouter tool-call path):
- Guard empty `choices` (IndexError) → friendly recoverable message.
- Guard malformed tool-call argument JSON (JSONDecodeError) → skip that call
rather than crash the whole turn. Mirrors the existing Ollama-path guard.
iOS connect.js races (stale content into a fresh conversation):
- A conversation-generation counter + AbortController: New Conversation / switch
Mac / forget now invalidate and abort any in-flight stream or resume fetch, so
a late SSE event or slow /latest-turn response can't paint into the new chat.
- forget now clears lastSeenTurnId.
Input limits (no cryptic 422 on long input):
- Desktop + iOS composers: maxlength=10000 (matches the server) + a desktop
send-guard with a clear over-limit message.
Regression tests: case-insensitive denylist (6 cases) + provider-response
robustness (empty choices, malformed tool args, valid call still parses).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A full bundle takes a few minutes (one large content generation + images + compilation + the auto-chain). Two improvements on the path the teacher is actively waiting on: - Per-phase progress so the wait shows continuous activity instead of a silent spinner: "Lesson content written — now adding images…" and "Building the teacher plan, student handout, and slides…". - The three differentiation variants (IEP/504, ELL, Gifted) were generated by three sequential LLM calls; run them concurrently with asyncio.gather (~3x less latency for that stage). Same files, labels, and per-variant error handling — just no longer serialized. Verified: a real Sonnet-4.6 Columbian Exchange build produces a complete, classroom-ready bundle — correct 9-10.RH standards band, real attributed primary sources (Columbus, de las Casas), no answer-key leaks on the student handout, 12-slide deck, full differentiation. 235 lesson/bundle tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mit fixes) Build 15 ships the iOS connect.js conversation-generation guards (new conversation / switch Mac / forget now cancel in-flight stream + resume so stale content can't paint into a fresh chat; forget clears lastSeenTurnId) and the composer maxlength=10000. Verified the built IPA reported CFBundleVersion=15 and contained the fixes before upload. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The review game, learning journey, and deep-research report (3 sub-queries) ran one after another after the core lesson — three serial LLM-heavy stages on the path the teacher is actively waiting on. They only read `master` and each writes its own file, so run them concurrently with asyncio.gather. Same files, labels, and per-item error isolation; results appended after the gather to avoid any shared-list race. Combined with the earlier differentiation parallelization, this collapses the post-content stages of a full bundle build. 170 lesson/bundle tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ish)
A full bundle takes a few minutes, dominated by the core lesson generation and
its quality gates. Previously the teacher saw NOTHING until the entire bundle —
including the slower review game, learning journey, and deep-research report —
was done. Now the core teaching package (teacher plan, student handout, slides,
and the three differentiation variants) is delivered the moment it's compiled,
via per-file `artifact` SSE events, and the bonus extras stream in after with a
"core lesson is ready above" note. Same total work + same quality gates — the
usable lesson just lands in the teacher's hands ~minutes sooner.
- Split _run_auto_chain into _run_differentiation (core) and _run_extras
(game/journey/research); execute() emits artifacts after each stage.
- New `artifact` event ({path,name}) forwarded by the existing SSE event path.
- iOS connect.js + desktop app.js render the `artifact` event; iOS appendArtifact
now dedupes (a file can arrive early AND in the final list) — set cleared on
every conversation reset. Desktop already deduped via addArtifactOnce.
Per Jon's choice (core-first delivery) over a lighter default or relaxed gates,
keeping full quality. 175 lesson/bundle/stream tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Build 16 ships the iOS side of core-first delivery: connect.js renders the new `artifact` SSE event (files streamed as they finish during a build) with per-conversation dedup against the final file list. Verified CFBundleVersion=16 + the artifact code in the built IPA before upload. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The chat had no keyboard handling at all (no @capacitor/keyboard plugin, no viewport JS), so on a real phone the iOS software keyboard overlaid the WebView: the composer was hidden behind the keyboard (typing blind), and the feed's scroll chained to the body (rubber-banding — the "sloppy" feel). - .chat height now tracks window.visualViewport (the real visible area) via a --app-h CSS var, so when the keyboard opens the whole chat shrinks to sit above it and the composer stays visible. 100dvh is the no-keyboard fallback. - Re-pin the chat if iOS offsets the layout viewport under the keyboard (translateY by visualViewport.offsetTop), updated on vv resize/scroll (coalesced to one frame); scroll to the newest message only on resize/focus, never on plain viewport scroll (so scrolling up through history isn't yanked). - body.remote-on is locked (overflow:hidden; position:fixed) so it can't scroll or rubber-band behind the fixed chat; .feed gets overscroll-behavior:contain + touch-action:pan-y so momentum stays inside the feed. - On textarea focus, scroll the latest message + composer back into view after the keyboard settles. Plugin-independent — works in the app and mobile Safari. Verified the new layout renders cleanly in the iOS Simulator (real WebKit). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Build 17 ships the visualViewport-driven keyboard/scroll fix. Verified CFBundleVersion=17 + the viewport code in the built IPA before upload. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le Settings
Teachers can now connect an AI provider entirely inside the native window —
no terminal, no localhost:8000 in a browser. Ports the proven web connect
flow (clawed/api/static/app.js) into the Tauri UI faithfully.
- Onboarding panel (new #view-onboarding): auto-detects a ready provider via
/api/onboarding/detect ("We found <provider> … Use it & continue"), else a
5-provider chooser (Ollama-first, BYO key for OpenRouter/Anthropic/OpenAI/
Google) → POST /api/settings → GET /api/settings/test-connection → success.
No bundled shared key; plain-language copy + "Get a key" links for teachers.
- Editable Settings: provider select, masked key + Show toggle, model field,
Test connection / Save settings — replaces the read-only dead end.
- Dead-end routing: the "Provider needs setup" status pill, the empty-chat
inline banner, a sendMessage gate, and a ⌘K "Connect your AI" command all
open the onboarding panel. Auto-launches once on first run when no provider.
- Gate entry to chat on a LIVE passing test-connection (with /health fallback
for providers the tester doesn't cover, matching the web flow).
- CSS for the connect cards/forms/status boxes/success state in Calm Studio.
CRITICAL: every ported fetch is BASE-prefixed (fetch(`${BASE}/api/…`)) — the
native UI talks to an absolute loopback base, not relative paths.
node --check passes; existing views (chat/Skills/Materials/Workspace/pairing)
untouched. Verified all four onboarding states render in headless Chrome.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… Tauri desktop/) The Tauri desktop app is the canonical signed+notarized double-click distribution for non-technical teachers (with in-window onboarding). Park the Swift launcher so it doesn't become a confusing second download path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…connected
.connect-banner{display:flex} (and other connect-* rules) tied on specificity
with the default [hidden]{display:none} and won, so element.hidden was a no-op
— the 'Connect your AI' banner showed even when a provider was connected
(caught via real-window screenshot). Add [hidden]{display:none!important}.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes Claw-ED usable by a teacher without touching the terminal, built on the existing FastAPI app (
clawed/api) — no engine rewrite.clawed/api/static/claude-theme.css): warm Anthropic palette (cream canvas, clay accent, serif display) layered over the token-based stylesheet — re-skins the whole web UI with zero layout changes. Light + warm-dark.settings.html): step-by-step "how to get a key" for Anthropic + OpenAI — console links, one-tap CTA, "✓ saved" state, local-only reassurance.clawed applauncher: binds 127.0.0.1, trusts same-machine requests (no token wall), opens the browser, routes first-timers straight to key setup.Run it
Notes
🤖 Generated with Claude Code