feat(invoice-ai): local invoice AI assistant (Layer 1 embedding + Layer 2 slot LLM)#712
feat(invoice-ai): local invoice AI assistant (Layer 1 embedding + Layer 2 slot LLM)#712
Conversation
Adds the two model-runtime dependencies for the local invoice AI assistant. @xenova/transformers (Transformers.js) drives the Layer 1 multilingual-e5-small embedding classifier on every device. @mlc-ai/web-llm drives the opt-in Layer 2 Qwen2.5-1.5B slot extractor on WebGPU-eligible hardware. Spec: docs/superpowers/specs/2026-05-06-local-invoice-ai-assistant-design.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The barrel re-exports AssistantPanel which will be added in Phase 10. Until then this file's import will TS-error — expected and intentional; future tasks have a specific shape for AssistantPanel. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds the shared type vocabulary used across every layer of the AI assistant: AssistantLocale (en/ro/fr), IntentId (10 intents from v1 catalog), Timeframe (12 canonical windows), VizHint, and the CONFIDENCE_THRESHOLDS constant pair. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Layer 1 (embedding model) ships to every device on WASM. This gate only decides whether to offer the Layer 2 (~1 GB Qwen-1.5B) opt-in CTA. Returns eligible | ineligible | unknown with machine-readable reason codes. Hard gates: workers-unavailable, webgpu-unavailable, webgpu-adapter-unavailable, storage-quota-too-low. Soft gates (only when reported): memory-too-low, cpu-too-low. 8 unit tests cover all branches. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nch tests Addresses code-review findings on the hardware eligibility module: - Fix silent fallthrough when navigator.gpu exists but requestAdapter is not callable (partial polyfill / future API drift). The branch now pushes webgpu-adapter-unavailable explicitly. - Add JSDoc to all 3 exports (HardwareEligibilityReason, HardwareEligibilityResult, checkHardwareEligibility) per RFC 1002. - Extract the 186-char inline navigator type cast into a named NavigatorWithHardwareHints interface. Test file gets a NavigatorStub. - Add 5 new tests covering: requestAdapter throws, requestAdapter missing while gpu present, storage.estimate throws, navigator absent, and boundary values (deviceMemory === 4, hardwareConcurrency === 4). - Add rationale comments to the three threshold constants. Test count: 8 -> 13. Coverage: 94% statements -> ~99% expected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Declarative registry of intent IDs, slot grammar per intent, and viz hint per intent. The IntentDefinition shape lets the resolver and renderer dispatch generically without per-intent special cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Layer 1 path for slot extraction. Pure regex/keyword tables for en/ro/fr that translate canonical user phrasings to the discrete Timeframe enum and topK integer. Diacritic-folding + case-insensitive matching. topK clamps to [1, 20] with default 5. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Trust boundary between the model layer and the deterministic aggregator layer. Validates intent against the catalog whitelist, normalizes slots against the canonical Timeframe enum, clamps topK to [1, 20], and falls back to question-text lexicon parsing when slots arrive empty. NEVER spreads or passes through unvalidated slot values. Three-state slot inspection (valid / invalid / absent) ensures that a present-but-invalid slot rejects with out-of-scope rather than silently falling through to the default. 8 unit tests cover happy paths and validation rejections. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
10 canonical phrasings per intent per locale (300 total). Drives the Layer 1 cosine-similarity classifier. The build-time embeddings generator (Phase 7) will encode these into a precomputed matrix shipped with the worker. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…currency) Three deterministic fixture generators: empty, ~54 EUR invoices over 18 months, and a multi-currency variant that flips alternating invoices to RON. Used by every aggregator test in Phase 4 to verify empty-result branches, currency grouping, and date-window edges. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
resolveTimeframeWindow translates the 12 canonical Timeframe values into UTC Date ranges. filterByTimeframe + filterNotDeleted + groupByCurrency are the building blocks every aggregator composes. All time-dependent functions take 'now' as an explicit param so tests are deterministic. groupByCurrency tolerates both shapes of paymentInformation.currency (plain string from fixtures, Currency object from production types). 8 unit tests cover window math, soft-delete filtering, and per-currency bucketing across the empty / single-currency / multi-currency fixtures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pure function (invoices, {timeframe, category?}, now) -> TotalSpendResult.
Filters soft-deleted, applies optional category filter, splits multi-currency
results into per-currency buckets. Returns explicit empty marker when no
invoices match.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…kdown Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ant filter Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…currency) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Single entry point: runAggregator(intent, invoices, slots, now) returns a discriminated StructuredAnswer union. Exhaustiveness check via never ensures any future intent addition is a compile-time enforcement. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds the localized message catalog: panel labels, state messages, Layer 2 opt-in CTA copy, action buttons, timeframe labels, example chip labels, and answer templates with ICU plural rules for all 10 intents in en/ro/fr. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e+viz Pure dispatch on intent. Returns localized prose by calling injected next-intl t() with template keys + params. Empty-result branches produce friendly 'try alternatives' copy. Translator function is injected so the module is testable without next-intl. 5 unit tests cover populated/empty branches across totalSpend, topMerchantsByCount, and the spendComparison no-change vs delta templates. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Four minimal a11y-aware viz components built on @arolariu/components Card primitives + plain SVG (donut). No chart library dependency. Each emits a stable data-testid and uses role="img" with aria-label on visual elements for screen-reader access. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Discriminated State union (10 statuses) + exhaustive Action union. Two consecutive slotLlmTimeout actions raise shouldRestartSlotHost (one-shot flag the hook reads + clears via resetSlotHostFlag). History capped at 50 entries (oldest evicted via slice). Layer 2 sub-state is independent of the main status: the assistant can answer questions while the Layer-2 model is downloading. 12 unit tests cover all transitions + cap + flag + reset semantics. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…matrix Encodes 300 seed phrases (10 intents x 10 phrasings x 3 locales) with Xenova/multilingual-e5-small. Sub-ms cosine sim at runtime; only the user's question encodes per classify call (~50 ms). The committed seedEmbeddings.json is an empty placeholder so the worker module compiles. Engineer must run: node scripts/generate.embeddings.ts to download the 118 MB model and write the ~460 KB precomputed matrix before the assistant returns useful classifications. The script is idempotent and re-runs cheaply once the model is cached. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wraps Transformers.js multilingual-e5-small in createWorkerHost. Uses the precomputed seed-phrase matrix for sub-ms cosine ranking. Returns top intent + score + top-3 candidates. API surface (embedding.api.ts): EmbeddingWorkerApi with ensureLoaded() + classify(). Implementation is module-level singleton (extractor only loads once per worker lifetime). Tests use vi.hoisted for the Transformers.js mock so the asyncFn extractor is captured cleanly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5s default call timeout; 5min idle timeout for lazy reboot. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ess) Uses WebLLM's MLCEngine in-process inside our createWorkerHost worker. JSON-mode + temperature=0 for deterministic output. Defensive: rejects hallucinated intents not in the candidate list. API surface (slotExtractor.api.ts): SlotExtractorWorkerApi with ensureLoaded() + extract() + unload(). Implementation reloads the ~1 GB Qwen-1.5B model on first ensureLoaded; chat completions enforce JSON object response_format. 4 unit tests cover not-loaded reject, valid JSON happy path, hallucinated intent reject, and invalid JSON reject. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
30s default call timeout (cold-start model load); 10min idle timeout. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eline Owns: reducer state machine, two worker hosts (Layer 1 eager, Layer 2 lazy on opt-in), Strict-Mode-safe lifecycle (PR #699 pattern), and the classify -> resolve -> aggregate -> render pipeline. Auto-restarts the slot host when consecutive timeouts hit the threshold (the reducer sets a one-shot flag the hook reads + clears via resetSlotHostFlag). 2 unit tests cover the boot transition (capability-check -> embedding-loading -> embedding-ready) and the end-to-end classify->resolve->aggregate->render pipeline against an empty corpus. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Renders prose + dispatched viz primitive based on intent. Lifts the viz extractors to local helpers so the panel doesn't need them. 3 unit tests cover bar-chart, single-stat, and donut viz dispatches. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Renders all 10+ states with role/aria-live attributes; aria-busy on input during pending; chip clicks re-submit canonical queries; Layer 2 opt-in CTA in header with download progress + active badge. Includes aggregator-error alert path. Updates the public barrel to re-export AssistantPanel + AssistantPanelProps. 4 unit tests cover the workers-unavailable terminal state, the embedding-ready chips state, the Layer 2 eligible CTA, and the embedding-loading progress bar. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the chat-tab placeholder body (Card wrapping a stub MessageList) with a single <AssistantPanel /> mount. The settings tab remains untouched. The `invoices` prop is preserved for API compatibility but the assistant reads directly from useInvoicesStore so no prop wiring is needed. 188 existing view-invoices tests still pass (no regression). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r Node 24 ESM
Node 24 native TypeScript loader requires explicit .ts extensions for
relative module specifiers. Patched the generator to import
seedPhrases.{en,ro,fr}.ts. Regenerated the matrix: 300 embeddings,
~2.4 MB JSON.
After this commit the embedding worker can classify questions against
the real seed phrases instead of the empty placeholder.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…olds Reports intra-class vs inter-class cosine similarity over the 300 seed phrases. Recommends canonical/uncertain thresholds. Manual script - not run in CI. Initial calibration on the regenerated matrix shows significant overlap between intents (intra mean 0.90, inter mean 0.86 — 10th-pct intra 0.85 is BELOW 90th-pct inter 0.90). The current CONFIDENCE_THRESHOLDS (canonical=0.75, uncertain=0.55) intentionally err generous so Layer-1 catches more cases; tighten only after a larger seed-phrase corpus has been added per locale. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4 critical scenarios: cold-start happy path, out-of-scope -> chip flow, multilingual (ro), and Strict-Mode tab leave + return clearing history. Gated to environments with WebGPU; CI may need to skip when running on headless workers without GPU acceleration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
❌ Code Hygiene Report: Issues FoundCommit: 📑 Table of Contents
📋 Check Summary
📊 Code StatisticsChanges vs Main Branch
🔄 Changes Since Previous Commit
📦 Bundle Size Analysis (vs Main)`sites/arolariu.ro` - +2.79 MB (101 file(s) changed)
Total: 10 MB → 12.8 MB (+2.79 MB) `sites/api.arolariu.ro` - no change (0 file(s) changed)No changes in this folder Total: 2.14 MB → 2.14 MB (no change) `sites/docs.arolariu.ro` - no change (0 file(s) changed)No changes in this folder Total: 214 kB → 214 kB (no change) 🎨 Formatting❌ 78 file(s) need formatting: View files requiring formatting
🔧 How to Fixnpm run format🔍 Linting❌ ESLint found 3 error(s) and 0 warning(s) View raw output🔧 How to Fixnpm run lint🧪 Unit Tests❌ 0 of 1147 tests failed
🔗 View Workflow Run | Generated at 2026-05-07T17:41:13.056Z |
There was a problem hiding this comment.
Pull request overview
Adds a fully local “Invoice AI assistant” to the invoices UI: Layer 1 multilingual embedding-based intent classification (Transformers.js) plus an optional Layer 2 slot extractor (WebLLM/WebGPU), feeding deterministic TypeScript aggregators over IndexedDB-backed invoice state and rendering answers with simple visualization primitives.
Changes:
- Replaces the existing GenerativeView chat stub with the new
AssistantPanelUI and adds a Playwright E2E spec for the assistant tab. - Introduces Layer 1/Layer 2 worker implementations (embedding classifier + slot extractor), a reducer-driven state machine, and the
useInvoiceAssistanthook wiring the pipeline. - Adds deterministic aggregators + renderer + i18n strings (en/ro/fr) for 10 analytics intents.
Reviewed changes
Copilot reviewed 74 out of 76 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| sites/arolariu.ro/src/app/domains/invoices/view-invoices/_components/views/GenerativeView.tsx | Wires the chat tab to the new AssistantPanel (replacing the stub chat UI). |
| sites/arolariu.ro/src/app/domains/invoices/view-invoices/_components/views/generative-view.spec.ts | Adds Playwright E2E coverage for assistant tab scenarios (incl. RO locale + history reset). |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/workers/slotExtractor.implementation.ts | Implements Layer 2 slot extraction via WebLLM (Qwen) in a worker. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/workers/slotExtractor.implementation.test.ts | Unit tests for Layer 2 slot extraction behavior and validation. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/workers/slotExtractor.api.ts | Defines the typed RPC contract for the slot extractor worker. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/workers/slot-extractor.worker.ts | Worker entry that exposes the slot extractor implementation. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/workers/embedding.worker.ts | Worker entry that exposes the embedding implementation. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/workers/embedding.implementation.ts | Implements Layer 1 embedding classifier (seed matrix + cosine similarity). |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/workers/embedding.implementation.test.ts | Unit tests for the embedding classifier behavior. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/workers/embedding.api.ts | Defines the typed RPC contract for the embedding worker. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/useInvoiceAssistant.tsx | Hook that owns hosts + reducer + classify/resolve/aggregate/render pipeline. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/useInvoiceAssistant.test.tsx | Hook-level unit tests with host/hardware/store mocks. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/types.ts | Shared assistant types (locales, intents, timeframes, confidence thresholds). |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/renderer/viz/SingleStat.tsx | Single-stat visualization primitive for answers. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/renderer/viz/SingleStat.test.tsx | Tests for SingleStat. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/renderer/viz/Donut.tsx | Donut visualization primitive for category breakdown. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/renderer/viz/Donut.test.tsx | Tests for Donut. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/renderer/viz/ComparisonPair.tsx | Comparison visualization primitive (two values + delta). |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/renderer/viz/ComparisonPair.test.tsx | Tests for ComparisonPair. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/renderer/viz/BarChartHorizontal.tsx | Horizontal bar chart visualization primitive. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/renderer/viz/BarChartHorizontal.test.tsx | Tests for BarChartHorizontal. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/renderer/answerRenderer.ts | Maps structured aggregator output into prose + viz hint + payload. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/renderer/answerRenderer.test.ts | Unit tests for renderer branches. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/intents/slotLexicon.ts | Locale-aware deterministic slot parsing (timeframe/topK). |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/intents/slotLexicon.test.ts | Unit tests for slot lexicon parsing. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/intents/seedPhrases.en.ts | EN seed phrases for embedding classifier. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/intents/seedPhrases.ro.ts | RO seed phrases for embedding classifier. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/intents/seedPhrases.fr.ts | FR seed phrases for embedding classifier. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/intents/intentResolver.ts | Trust-boundary validation/coercion for intent + slots. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/intents/intentResolver.test.ts | Unit tests for resolver normalization and rejection cases. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/intents/catalog.ts | Intent registry (slots + viz hint) for the 10-intent catalog. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/index.ts | Public barrel export for assistant module consumers. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/hosts/embeddingHost.ts | Intended to provide Layer 1 worker host factory. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/hosts/slotExtractorHost.ts | Intended to provide Layer 2 worker host factory. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/hardwareEligibility.ts | WebGPU/storage/memory/CPU gating for offering Layer 2 opt-in. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/hardwareEligibility.test.ts | Unit tests for the hardware eligibility gate. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/assistantReducer.ts | Reducer/state machine for assistant UX + history + Layer 2 state. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/assistantReducer.test.ts | Unit tests for reducer transitions and history capping. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/AssistantPanel.test.tsx | Component-level tests for AssistantPanel UI states. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/AssistantMessage.tsx | Renders a single assistant history entry + visualization. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/AssistantMessage.test.tsx | Tests for AssistantMessage + viz dispatch. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/shared.ts | Shared deterministic helpers (time windows, currency grouping, soft-delete filtering). |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/shared.test.ts | Tests for shared aggregator helpers. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/index.ts | Aggregator registry to dispatch by intent. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/totalSpend.ts | Aggregator for total spend. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/totalSpend.test.ts | Tests for totalSpend. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/invoiceCount.ts | Aggregator for invoice/receipt count. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/invoiceCount.test.ts | Tests for invoiceCount. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/topSpendingByCategory.ts | Aggregator for top spend categories. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/topSpendingByCategory.test.ts | Tests for topSpendingByCategory. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/topMerchantsByCount.ts | Aggregator for top merchants by visit count. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/topMerchantsByCount.test.ts | Tests for topMerchantsByCount. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/topMerchantsBySpend.ts | Aggregator for top merchants by spend. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/topMerchantsBySpend.test.ts | Tests for topMerchantsBySpend. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/topProductsByCount.ts | Aggregator for top products by quantity. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/topProductsByCount.test.ts | Tests for topProductsByCount. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/topProductsBySpend.ts | Aggregator for top products by spend. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/topProductsBySpend.test.ts | Tests for topProductsBySpend. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/spendComparison.ts | Aggregator for timeframe spend comparisons. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/spendComparison.test.ts | Tests for spendComparison. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/averageSpendPerVisit.ts | Aggregator for average basket size / spend-per-visit. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/averageSpendPerVisit.test.ts | Tests for averageSpendPerVisit. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/categoryBreakdown.ts | Aggregator for category breakdown (for donut viz). |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/categoryBreakdown.test.ts | Tests for categoryBreakdown. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/fixtures/empty.fixtures.ts | Empty-corpus fixture for aggregator tests. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/fixtures/single-currency.fixtures.ts | Deterministic single-currency invoice corpus fixture. |
| sites/arolariu.ro/src/app/domains/invoices/_components/ai/aggregators/fixtures/multi-currency.fixtures.ts | Deterministic multi-currency corpus fixture (currency split verification). |
| sites/arolariu.ro/package.json | Adds WebLLM + Transformers.js dependencies. |
| sites/arolariu.ro/messages/en.json | Adds InvoiceAssistant translation namespace (EN). |
| sites/arolariu.ro/messages/ro.json | Adds InvoiceAssistant translation namespace (RO). |
| sites/arolariu.ro/messages/fr.json | Adds InvoiceAssistant translation namespace (FR). |
| scripts/generate.embeddings.ts | Build-time embedding matrix generator for seed phrases. |
| scripts/calibrate-assistant-embeddings.ts | Manual calibration tool for confidence thresholds. |
| @@ -0,0 +1 @@ | |||
| System.Management.Automation.Internal.Host.InternalHost No newline at end of file | |||
| @@ -0,0 +1 @@ | |||
| System.Management.Automation.Internal.Host.InternalHost No newline at end of file | |||
| async function main(): Promise<void> { | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| const extractor: any = await pipeline("feature-extraction", "Xenova/multilingual-e5-small"); | ||
| const allLocales: Array<[Locale, typeof SEED_PHRASES_EN]> = [ |
| // Reset the module-level extractor for this test by re-importing isn't trivial; | ||
| // instead this test relies on a fresh module state OR previous tests not having | ||
| // succeeded. Since ensureLoaded is idempotent and module state persists, we | ||
| // verify that a fresh impl that hasn't loaded yet rejects. | ||
| // To guarantee a clean state, this test runs first AND nothing else has loaded. | ||
| await expect(impl.classify({question: "x", locale: "en"})).rejects.toThrow("not loaded"); |
| useEffect(() => { | ||
| if (state.shouldRestartSlotHost && slotHost) { | ||
| void (slotHost as unknown as {restart?: () => Promise<void>}).restart?.(); | ||
| dispatch({type: "resetSlotHostFlag"}); | ||
| } |
| const enableLayer2 = useCallback(async (): Promise<void> => { | ||
| if (slotHost) return; | ||
| dispatch({type: "layer2OptInClicked"}); | ||
| const newHost = createSlotExtractorHost(); | ||
| setSlotHost(newHost); | ||
| try { | ||
| await newHost.api.ensureLoaded(); | ||
| dispatch({type: "layer2Loaded"}); | ||
| } catch (err) { | ||
| dispatch({type: "layer2Failed", error: String(err)}); | ||
| } |
| function extractLabel(payload: unknown): string { | ||
| const p = payload as {timeframe?: string}; | ||
| return p.timeframe ?? ""; | ||
| } |
| <ComparisonPair | ||
| labelA={first.a.timeframe} | ||
| valueA={`${first.a.totalSpend.toFixed(2)} ${first.currency}`} | ||
| labelB={first.b.timeframe} | ||
| valueB={`${first.b.totalSpend.toFixed(2)} ${first.currency}`} |
| <svg viewBox="0 0 120 120" width="120" height="120" role="img" aria-label="Spending breakdown by category"> | ||
| <circle cx={cx} cy={cy} r={radius} fill="transparent" stroke="#e5e7eb" strokeWidth={stroke} /> |
| function extractValue(payload: unknown): string { | ||
| const p = payload as {buckets?: ReadonlyArray<Record<string, unknown>>; count?: number}; | ||
| if (typeof p.count === "number") return String(p.count); | ||
| const first = p.buckets?.[0]; | ||
| if (first) return `${first["totalSpend"] ?? first["averageSpend"] ?? ""} ${first["currency"] ?? ""}`; |
…y-after-fail THREE issues fixed in one commit: 1. CRITICAL: Both host files (embeddingHost.ts, slotExtractorHost.ts) were committed with corrupt content. Their previous body was a PowerShell stringification of $host (the read-only automatic variable) — a generator-script footgun where my variable assignment was silently ignored and the InternalHost object got serialized. Both files now contain the intended createWorkerHost factory exports. The Next.js dev server immediately surfaced this with "Export createSlotExtractorHost doesn't exist in target module". 2. HIGH: useInvoiceAssistant never disposed slotHost on unmount, so every navigation away/back leaked a Worker thread holding the ~1 GB Qwen-1.5B engine in memory. Added a dedicated cleanup useEffect that calls slotHost.dispose() when the slotHost reference changes or the component unmounts. 3. HIGH: enableLayer2 set slotHost before awaiting ensureLoaded. If the model load failed, the broken host stayed in state and the `if (slotHost) return;` guard permanently blocked retries — the user would have to reload the page to try again. The catch block now disposes the dead host and clears slotHost so retry works. Both reviewer findings cited PR #712 review (sonnet-4.6). Also bumps package-lock.json with @next/swc-darwin-{arm64,x64} optional binaries (npm install side-effect during the build:components run earlier). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
arolariu
left a comment
There was a problem hiding this comment.
AI Assistant Panel — Code Review
Reviewed at HEAD a8bd811f
Previously Fixed (Turn 0)
Two HIGH issues identified in a prior review pass were already corrected in commit a8bd811f:
- Corrupt file content —
embeddingHost.tsandslotExtractorHost.tscontained the literal stringSystem.Management.Automation.Internal.Host.InternalHostbecause the generator script used$host, a PowerShell read-only automatic variable. Fixed. - Missing
slotExtractorHostcleanup — the host'sdispose()was not called on unmount, leaking the worker thread. Fixed. enableLayer2catch branch missingdispose()+ null-out — a failed load left the host alive and the null-guard permanently blocking retry. Fixed.
New Finding
AssistantPanel.tsx — Submit button not guarded against slot-extracting state
Severity: Medium
File: sites/arolariu.ro/src/app/domains/invoices/_components/ai/AssistantPanel.tsx
The submit button's disabled condition excludes the slot-extracting status:
// The input IS disabled during both states:
disabled={state.status === "classifying" || state.status === "slot-extracting"}
// But the button only checks "classifying":
<Button type="submit" disabled={!draft.trim() || state.status === "classifying"}>When the score falls in the uncertain band and a slot-LLM host is available, the pipeline dispatches slotExtracting and then awaits slotHost.api.extract(). During that window:
- The input is disabled (user cannot type), but
draftin React state has not been cleared yet —setDraft("")is called only aftersubmitQuestionresolves. - The submit button is not disabled, so clicking it fires a second
onSubmit. - A second
submitQuestion(draft)executes, dispatching anotherquestionSubmittedand racing to completion alongside the first call.
Concretely: both pipelines can reach dispatch({type: "answerReady", ...}), each calling appendHistory once, producing a duplicate history entry for the same question. With unlucky timing (first pipeline times out while the second classifies) the state machine can also transition from out-of-scope back to classifying, which is not a valid forward edge.
Fix: Add state.status === "slot-extracting" to the button's disabled condition.
Assessment
Outside the one finding above, the codebase is in good shape. The data pipeline (aggregators → resolver → renderer → viz) is defensively written throughout: groupByCurrency handles both the string and Currency-object shapes correctly, all aggregators guard the empty-result path before accessing buckets[0]!, the reducer is a clean discriminated-union state machine with no unexpected transitions for its happy path, and the hardware eligibility probe correctly distinguishes hard gates (WebGPU, Workers) from soft signals (RAM, CPU).
arolariu
left a comment
There was a problem hiding this comment.
Bug: submit allowed during slot extraction
The button is not disabled when state.status === 'slot-extracting'. During slot extraction, draft is non-empty (cleared only after submitQuestion resolves) and the classifying guard is false, so clicking the button fires a second submitQuestion call in parallel with the first.
Both pipelines race to call appendHistory, producing a duplicate history entry. With worse timing (first times out, second re-classifies) the state machine takes an invalid out-of-scope -> classifying edge.
Fix: disabled={!draft.trim() || state.status === 'classifying' || state.status === 'slot-extracting'}
| disabled={state.status === "classifying" || state.status === "slot-extracting"} | ||
| aria-busy={state.status === "classifying" || state.status === "slot-extracting"} | ||
| /> | ||
| <Button type="submit" disabled={!draft.trim() || state.status === "classifying"}> |
There was a problem hiding this comment.
Bug: submit allowed during slot extraction
The button is not disabled when state.status === 'slot-extracting'. During slot extraction, draft is non-empty (cleared only after submitQuestion resolves) and the classifying guard is false, so clicking the button fires a second submitQuestion call in parallel with the first.
Both pipelines race to call appendHistory, producing a duplicate history entry. With worse timing (first times out, second re-classifies) the state machine takes an invalid out-of-scope -> classifying edge.
Fix: disabled={!draft.trim() || state.status === 'classifying' || state.status === 'slot-extracting'}
The button's disabled condition only checked `classifying`, leaving a window during slot extraction where the user could re-click submit. Because `draft` is only cleared after `submitQuestion` resolves and the input's `disabled` doesn't block the button, a second click fires a parallel pipeline. Both pipelines race to appendHistory (duplicate entry) and one may take an invalid out-of-scope -> classifying edge on timeout. Addresses MEDIUM finding from PR #712 review. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| useEffect(() => { | ||
| if (state.shouldRestartSlotHost && slotHost) { | ||
| void (slotHost as unknown as {restart?: () => Promise<void>}).restart?.(); | ||
| dispatch({type: "resetSlotHostFlag"}); | ||
| } |
| async function main(): Promise<void> { | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| const extractor: any = await pipeline("feature-extraction", "Xenova/multilingual-e5-small"); | ||
| const allLocales: Array<[Locale, typeof SEED_PHRASES_EN]> = [ | ||
| ["en", SEED_PHRASES_EN], |
| export type Layer2State = | ||
| | Readonly<{status: "ineligible"; reasons: ReadonlyArray<string>}> | ||
| | Readonly<{status: "eligible"}> | ||
| | Readonly<{status: "downloading"; progress: number}> | ||
| | Readonly<{status: "ready"}> | ||
| | Readonly<{status: "failed"; error: string}>; |
| describe("createEmbeddingImpl", () => { | ||
| it("requires ensureLoaded before classify", async () => { | ||
| const impl = createEmbeddingImpl(); | ||
| // Reset the module-level extractor for this test by re-importing isn't trivial; | ||
| // instead this test relies on a fresh module state OR previous tests not having | ||
| // succeeded. Since ensureLoaded is idempotent and module state persists, we | ||
| // verify that a fresh impl that hasn't loaded yet rejects. | ||
| // To guarantee a clean state, this test runs first AND nothing else has loaded. | ||
| await expect(impl.classify({question: "x", locale: "en"})).rejects.toThrow("not loaded"); | ||
| }); |
| if (slots.category) { | ||
| const cat = slots.category; | ||
| filtered = filtered.filter((inv) => String(inv.category) === cat || (inv.category as unknown as string) === cat); | ||
| } |
…+ retry button User reported "Cannot convert undefined or null to object" at module-eval of @xenova/transformers when opening the AI tab on localhost:3000, plus the embedding-failed "Try again" button doing nothing. ## Root cause #1: Turbopack worker bundle doesn't honor browser:false @xenova/transformers v2.17.2 statically imports `fs`, `path`, `url` at the top of env.js to detect Node, then calls `Object.keys(fs)` via `isEmpty`. The package's `package.json` has `browser: { fs: false, ... }` that is supposed to substitute empty stubs in browser bundles, but Turbopack's worker bundler resolves them to `undefined` instead and `Object.keys(undefined)` throws at module evaluation. ## Root cause #2: retryEmbeddingLoad never existed The "Try again" button was wired to `resetConversation`, but the reducer explicitly preserves `embedding-failed` status under that action (so it doesn't lie about the model being ready). The button fired but nothing visibly transitioned. A real retry must dispose the failed worker host and create a fresh one. ## Layered fix 1. **next.config.ts**: alias `fs`, `path`, `url`, `sharp`, `onnxruntime-node` to a new browser stub `@/lib/empty-module` via `turbopack.resolveAlias`. Mirrors the package's `browser: false` mappings explicitly. 2. **lib/empty-module.ts**: tiny no-op stub exporting an empty default plus the surface our deps actually touch (`promises`, `sep`, `join`, etc.) so the static-import shape is preserved. 3. **embedding.implementation.ts**: switched the static `import {pipeline} from "@xenova/transformers"` to a dynamic `await import(...)` inside `ensureLoaded()`. Defense in depth: even if a future bundler change reintroduces the env-detection crash, it now happens at runtime where we can catch and surface it as `embedding-failed` state instead of a hard module-eval crash. 4. **Retry button wiring** — new `retryEmbeddingLoad` callback on the hook that: - Dispatches `retryEmbeddingLoad` action (resets status to `capability-check` so the loading UI redraws) - Calls `embedHost.dispose()` then `setEmbedHost(createEmbeddingHost())` - The lifecycle useEffect picks up the new host and runs the load cycle Reducer adds `retryEmbeddingLoad` to the Action union; AssistantPanel passes the new callback to the alert button (with data-testid for E2E). All 21 affected tests still pass (reducer 12, hook 2, panel 4, embedding impl 3). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…uggingface/transformers v4
Per maintainer feedback: drops the next.config Turbopack alias hack +
the empty-module stub. The official @huggingface/transformers v4.2.0
package (the renamed successor of @xenova/transformers v2) doesn't
import Node builtins (fs/path/url) at module evaluation, so Turbopack's
worker bundler doesn't crash on it.
Changes:
- package.json: replace @xenova/transformers@2.17.2 with
@huggingface/transformers@4.2.0 (workspace = sites/arolariu.ro)
- next.config.ts: remove the turbopack.resolveAlias block for
fs/path/url/sharp/onnxruntime-node (no longer needed)
- src/lib/empty-module.ts: deleted (no longer referenced)
- embedding.implementation.ts: dynamic import target updated; the
pipeline() signature and the {data: Float32Array} output shape are
unchanged across the rename so no logic adjustments needed
- scripts/generate.embeddings.ts: import path updated; matrix
regenerated (300 embeddings, same Xenova/multilingual-e5-small model
hosted on HuggingFace Hub — the model name is unchanged across the
package rename)
- embedding.implementation.test.ts: vi.mock target string updated
Kept the dynamic import inside ensureLoaded() as defense in depth so
any future package-init failure surfaces as embedding-failed state
rather than a hard module-eval crash.
All 145 invoice-ai unit tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The /.clerk/ directory is created by @clerk/nextjs during local development and can contain secrets (publishable + secret keys). Auto-generated by clerk during npm install. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…igible CTA User reported three real UX issues: 1. "I always get 'You have no receipts in last quarter. Try all time.'" The store is empty, so every classified intent legitimately returns the empty-result template. The pipeline is correct; the UX is bad. FIX: render an explicit empty-corpus alert when invoices.length === 0 so the user understands they need to upload receipts before the assistant can compute anything. Adds InvoiceAssistant.emptyCorpus i18n keys in en/ro/fr. 2. "I don't have any option to download a bigger model" — the user is on hardware where checkHardwareEligibility returns ineligible (no WebGPU adapter, etc.). The Layer 2 CTA was hidden in favor of a tiny "i" badge in the corner, which the user couldn't find. FIX: render the Layer 2 button in disabled state with the unavailable tooltip + an inline ⓘ marker. Same affordance, just discoverable. 3. "Switching between Chat and Settings tabs resets the worker and I have to see the loading model dialog once again." Each tab switch unmounts AssistantPanel which unmounted the hook which disposed the worker hosts; remount triggered a fresh ~118 MB model load. Brutal UX. FIX: lift the worker hosts to MODULE-LEVEL SINGLETONS via lazy getters (getEmbedHost, getSlotHost). They survive React mount/unmount cycles and only get torn down on full page navigation. The reducer state remains per-hook (so conversation history clears on remount, matching the H1 architectural lock from the spec) — only the expensive model load is preserved. Also: when a fresh hook instance mounts and the slot singleton is already alive (user enabled Layer 2 earlier this session), the hook immediately dispatches layer2Loaded so the UI reflects the active state without re-running ensureLoaded. All 145 invoice-ai unit tests still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Adds a fully-local Invoice AI assistant: natural-language analytical queries ("top merchants last month?", "cum am cheltuit luna trecută?", "comparer mes dépenses ce mois avec le mois dernier") over invoices in IndexedDB. No network calls — all data stays in the browser.
Architecture (per design doc
docs/superpowers/specs/2026-05-06-local-invoice-ai-assistant-design.md):Xenova/multilingual-e5-smallon Transformers.js (~118 MB WASM). Cosine-ranks the user's question against 300 precomputed seed-phrase embeddings (10 intents × 10 phrasings × 3 locales). Returns top-3 candidates + score.Qwen2.5-1.5B-Instruct-q4f16_1-MLCon WebLLMMLCEngine(~1 GB WebGPU). JSON-mode + temperature=0 for deterministic slot extraction when the embedding signal is uncertain.createWorkerHost<TApi>from PR feat(workers): introduce Web Worker foundation as first-class platform primitive #699.useInvoicesStore.entities. The LLM never sees invoice data.What's included
hardwareEligibility.tsintents/catalog.tsintents/slotLexicon.tsintents/intentResolver.tsintents/seedPhrases.{en,ro,fr}.tsaggregators/__fixtures__/*.tsaggregators/shared.tsaggregators/{totalSpend,invoiceCount,...}.tsaggregators/index.tsmessages/{en,ro,fr}.jsonrenderer/answerRenderer.tsrenderer/viz/*.tsxassistantReducer.tsscripts/generate.embeddings.ts+seedEmbeddings.jsonworkers/embedding.{api,implementation,worker}.tsworkers/slotExtractor.{api,implementation,worker}.tshosts/{embeddingHost,slotExtractorHost}.tsuseInvoiceAssistant.tsxAssistantMessage.tsxAssistantPanel.tsxview-invoices/_components/views/GenerativeView.tsxscripts/calibrate-assistant-embeddings.tsview-invoices/_components/views/generative-view.spec.tsTotal: 38 atomic commits, ~4628 LOC added, 166 new unit tests (all green), 4 Playwright E2E scenarios. Full
npx vitest runsuite: 1982/1982 passing, coverage 94.66% lines / 96.53% functions / 95.15% statements / 82.81% branches.Architectural locks (from brainstorm)
Caveats / followups
CONFIDENCE_THRESHOLDS(canonical 0.75 / uncertain 0.55) intentionally err generous; tighten only after a larger seed corpus per locale is added.next.config.tsunchanged per directive. WebLLM + Transformers.js will requirescript-src 'wasm-unsafe-eval'+worker-src blob:— manual update needed before production.How to test locally
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com