Add thoughts timeline: capture, store, search & broadcast Claude thinking blocks by thedotmack · Pull Request #1083 · thedotmack/claude-mem

thedotmack · 2026-02-13T04:00:12Z

Summary

Adds complete backend pipeline for capturing Claude's internal thinking blocks from session transcripts
Stores thoughts in SQLite with FTS5 full-text search and Chroma vector embeddings for semantic search
Exposes REST API endpoints for thought storage, retrieval, and search (/api/thoughts, /api/sessions/thoughts)
Integrates thoughts into the unified search infrastructure (ChromaSearchStrategy, HybridSearchStrategy, SQLiteSearchStrategy)
Broadcasts thought_stored SSE events in real-time to connected viewer clients
Extracts thinking blocks via Stop hook from JSONL transcript files

Changes

Database (migration 21):

thoughts table with FTS5 virtual table and sync triggers
Indexes on session, project, and epoch

Storage layer (SessionStore):

storeThoughts(), getThoughts(), getThoughtsByIds(), searchThoughts()

Hook integration:

Transcript parser (extractThinkingBlocks) reads JSONL line-by-line
Stop hook Phase 1.5 extracts and POSTs thoughts to worker

API:

POST/GET /api/thoughts - store and retrieve
GET /api/thoughts/search - FTS5 search
POST /api/sessions/thoughts - session-scoped storage with auto-resolution

Vector search:

ChromaSync.syncThought() / syncThoughts() with backfill
Search results include ThoughtSearchResult type

SSE:

broadcastThoughtStored() on SessionEventBroadcaster

Test plan

Migration test: thoughts table creation, FTS5, indexes, triggers
Store test: CRUD operations, search, batch retrieval
Handler test: thinking block extraction from transcripts
Route tests: ThoughtsRoutes and SessionRoutes thought endpoints
SSE broadcaster test: thought_stored event formatting
Chroma integration test: vector sync and backfill
Manual: verify thoughts appear via API after a session with thinking blocks

🤖 Generated with Claude Code

Implements ThinkingBlock interface and extractThinkingBlocks() function that reads Claude Code JSONL transcript files and extracts thinking content blocks from assistant messages. Includes 11 unit tests covering edge cases (empty files, malformed JSON, missing timestamps, falsy thinking content). Validated against real transcript files (64 thinking blocks found across 5 files). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds migration for the `thoughts` table (stores extracted thinking blocks from Claude Code session transcripts) with full-text search via FTS5 and sync triggers. Migration added to both legacy migrations.ts (v8) and active MigrationRunner (v21). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Added ThoughtInput/Thought types and four SessionStore methods for thinking block persistence: storeThoughts, getThoughts, getThoughtsByIds, and searchThoughts (FTS5). Includes 23 passing tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Creates handleThoughtsExtraction() that extracts thinking blocks from transcripts and stores them via the worker API POST /api/sessions/thoughts. Includes comprehensive test suite with 5 tests covering edge cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Wire thoughts extraction into the CLI hook dispatch chain: - Create thoughts-extract EventHandler wrapping handleThoughtsExtraction - Register as event type in handlers/index.ts (between summarize and session-complete) - Add POST /api/sessions/thoughts endpoint in SessionRoutes for storing thinking blocks - Endpoint resolves memorySessionId/project from contentSessionId when not provided - Add 7 tests covering missing input, success, worker unavailable, and error cases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Wire the thoughts-extract handler into the Stop hook chain in hooks.json, positioned after summarize and before session-complete, to capture thinking blocks when a Claude Code session ends. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Create dedicated ThoughtsRoutes extending BaseRouteHandler with three endpoints: POST /api/thoughts for storing thinking blocks, GET /api/thoughts for project-scoped retrieval with time filtering, and GET /api/thoughts/search for FTS5 full-text search. Register in worker service background init after database initialization. Includes 18 integration tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…beddings Adds single-thought Chroma sync following the syncUserPrompt pattern. Creates documents with thought_id prefix and thinking_text as searchable content. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ht vector embeddings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… search integration After storing thoughts, the POST /api/thoughts endpoint now retrieves full thought records and fires async chromaSync.syncThoughts(). Chroma failures are caught and logged as warnings without blocking storage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… search completeness Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…eploy full Chroma integration The createThoughtsTable migration existed only in MigrationRunner (version 21) but SessionStore—which the worker service actually uses—had no call to it. Additionally, version 21 was already used by addOnUpdateCascadeToForeignKeys in SessionStore, causing MigrationRunner's version check to skip table creation entirely. Fix: Added createThoughtsTable() to SessionStore with migration version 22 and table-existence check for robustness. Updated MigrationRunner to also use version 22. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… all strategies Adds thought document support to search types (ThoughtSearchResult, ChromaDocType), ResultFormatter (formatting, combining, counting), ChromaSearchStrategy (where filter, categorization, hydration), and all supporting strategies. MCP search automatically includes thoughts via Worker HTTP API proxy. 119 bun tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…notifications Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…T endpoint After storing thoughts and syncing to Chroma, the POST /api/thoughts endpoint now broadcasts a thought_stored SSE event for each stored thought, enabling real-time UI updates when new thinking blocks arrive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…egration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…endpoint The production stop hook calls /api/sessions/thoughts (SessionRoutes), not /api/thoughts (ThoughtsRoutes). SSE broadcasting and Chroma vector sync were only wired into ThoughtsRoutes but not SessionRoutes, creating a gap where thoughts stored via the actual production path would not trigger real-time UI updates or vector indexing. Adds broadcastThoughtStored and chromaSync.syncThoughts calls to SessionRoutes.handleStoreThoughts, matching the pattern in ThoughtsRoutes. Includes 5 new integration tests verifying the fix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

# Conflicts: # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs

claude · 2026-02-13T04:03:13Z

PR Review: Thoughts Timeline Feature

Great work on this comprehensive feature addition! The implementation is well-structured with excellent test coverage. Below are my findings across code quality, architecture, security, and potential improvements.

✅ Strengths

1. Excellent Test Coverage

9 comprehensive test files covering all layers: migrations, storage, hooks, routes, SSE, and Chroma integration
Tests include both unit and integration scenarios
Migration tests verify schema creation, FTS5 setup, and triggers

2. Clean Architecture

Proper separation of concerns: extraction → storage → API → broadcast
Fire-and-forget Chroma sync prevents blocking
FTS5 virtual table with auto-sync triggers is efficient and elegant
SSE broadcasting enables real-time UI updates

3. Robust Database Design

FTS5 full-text search with proper triggers (INSERT/UPDATE/DELETE)
Appropriate indexes on memory_session_id, project, and created_at_epoch
Proper use of prepared statements prevents SQL injection
Migration versioning is consistent

4. Error Handling

Graceful degradation when worker is unavailable (returns 0 thoughts stored)
Proper timeout handling with configurable THOUGHTS_TIMEOUT_MS
Malformed transcript lines are skipped with debug logging

🔍 Issues & Recommendations

Critical: Privacy Tag Stripping Missing

Issue: Thinking blocks bypass the privacy tag stripping system. While observations/summaries use stripMemoryTagsFromJson() to remove <private> tags at the hook layer (edge processing), the thinking extraction handler doesn't apply this filtering.

Location: src/hooks/handlers/thoughts.ts:27-40

Impact: Users marking content as <private> in their prompts may have that content captured in thinking blocks and stored/vectorized.

Recommendation:

// In src/hooks/handlers/thoughts.ts
import { stripMemoryTagsFromJson } from '../../utils/tag-stripping.js';

const thoughts: ThoughtInput[] = blocks.map(block => ({
  thinking_text: stripMemoryTagsFromJson(block.thinking), // Strip privacy tags
  thinking_summary: null,
  message_index: block.messageIndex,
}));

Alternative: Document that thinking blocks are captured before privacy filtering as they represent Claude's internal reasoning (pre-response). This would require updating the privacy documentation to clarify the scope.

High: Error Handling Inconsistency

Issue: The transcript parser in thinking.ts:46 silently catches ALL errors when parsing JSON lines, which could hide schema changes or corruption issues.

Current Code:

} catch {
  logger.debug('THINKING', 'Skipping malformed transcript line', { lineIndex, transcriptPath });
}

Recommendation: Add error details to debug logging:

} catch (error) {
  logger.debug('THINKING', 'Skipping malformed transcript line', { 
    lineIndex, 
    transcriptPath, 
    error: error instanceof Error ? error.message : String(error) 
  });
}

Medium: Race Condition in SessionRoutes.handleStoreThoughts

Issue: The endpoint calls createSDKSession() which may create a new session OR return an existing one, but it doesn't validate if the session exists before querying dbSession. This could theoretically cause issues if getSessionById() returns undefined.

Location: SessionRoutes.ts:751-755

Current Flow:

const sessionDbId = store.createSDKSession(contentSessionId, '', '');
const dbSession = store.getSessionById(sessionDbId);
const memorySessionId = providedMemorySessionId || dbSession?.memory_session_id;

Recommendation: Add validation or document that createSDKSession guarantees a valid sessionDbId:

const sessionDbId = store.createSDKSession(contentSessionId, '', '');
const dbSession = store.getSessionById(sessionDbId);

if (!dbSession) {
  logger.error('SESSION', 'Session not found after creation', { sessionDbId, contentSessionId });
  return this.internalError(res, 'Session initialization failed');
}

Low: Migration Version Mismatch

Minor Inconsistency: The migration is labeled as version 22 in createThoughtsTable() but the comment says "migration 22" while the actual schema change is significant (new table with FTS5).

Location: SessionStore.ts:831-838

Observation: This appears to be migration #22 in the sequence. Just verify this aligns with migrations.ts and the migration runner. The comment in SessionStore.ts:652 mentions "migration 21" for a different change, so confirm version sequencing.

Low: Potential Performance Consideration

Observation: The searchThoughts() method uses SELECT t.* FROM thoughts t JOIN thoughts_fts f ON t.id = f.rowid WHERE thoughts_fts MATCH ? which is efficient. However, for large datasets, consider adding:

An EXPLAIN QUERY PLAN test to verify the query optimizer is using the FTS5 index correctly
A maximum limit cap (e.g., 500) to prevent unbounded result sets

Current: limit defaults to 50 but can be set arbitrarily high by API callers.

Code Quality: Minor Improvements

Type Safety: ThoughtInput in thoughts.ts:36 could benefit from runtime validation:
```
thoughts as ThoughtInput[]  // Consider using a schema validator like Zod
```
Magic Numbers: The timeout constant THOUGHTS_TIMEOUT_MS is good, but consider documenting why HOOK_TIMEOUTS.DEFAULT is appropriate vs. a custom timeout for potentially large transcripts.
Chroma Sync Error Logging: Both routes catch Chroma sync errors but don't distinguish between sync failures and initialization failures. Consider more granular error categorization.

📊 Test Coverage Assessment

Excellent coverage across:

✅ Database migrations and schema
✅ CRUD operations (store, get, search)
✅ Transcript parsing (malformed lines, multiple blocks)
✅ API routes (validation, error cases, success paths)
✅ SSE broadcasting
✅ Chroma vector sync and backfill

Missing coverage (nice-to-haves):

⚠️ Privacy tag stripping in thinking blocks (if implemented)
⚠️ Very large transcripts (100+ thinking blocks)
⚠️ Concurrent thought storage from multiple sessions

🔒 Security Assessment

Positive:

✅ Prepared statements used throughout (SQL injection protected)
✅ Input validation on all API endpoints
✅ Type checking on request bodies
✅ Localhost-only API (assuming middleware is applied)

Attention Needed:

⚠️ Privacy tag bypass (see Critical issue above)
⚠️ No rate limiting on /api/thoughts endpoints (could be abused to fill disk)
⚠️ No maximum content length validation on thinking_text (FTS5 has limits, but pre-validation would be cleaner)

🎯 Recommendations Summary

Must Fix Before Merge

Add privacy tag stripping to thinking block extraction OR document the intentional bypass

Should Fix Before Merge

Improve error logging in transcript parser (include error details)
Validate session existence in handleStoreThoughts OR document guarantee

Consider for Follow-up PR

Add rate limiting to thoughts endpoints
Add maximum content length validation
Add performance tests for FTS5 search with large datasets
Document migration version sequencing strategy

✨ Overall Assessment

This is a high-quality implementation with excellent architecture, comprehensive tests, and proper separation of concerns. The FTS5 integration is particularly well done. The primary concern is ensuring privacy controls are consistently applied across all data capture points.

Recommendation: Approve with request for privacy tag handling clarification/implementation.

Great work! 🚀

greptile-apps · 2026-02-13T04:03:52Z

Greptile Overview

Greptile Summary

This PR implements a complete pipeline for capturing, storing, and searching Claude's internal thinking blocks from session transcripts. The implementation follows established patterns from the codebase for observations and summaries.

Key Changes:

Database Layer: Migration 22 adds thoughts table with FTS5 full-text search, proper indexes, and sync triggers following the same pattern as observations_fts from migration 6
Hook Integration: New thoughts-extract handler runs in Stop hook Phase 1.5 (after summarize, before session-complete) to extract thinking blocks from JSONL transcripts
Storage Pipeline: SessionStore adds CRUD operations (storeThoughts, getThoughts, getThoughtsByIds, searchThoughts) with proper SQL parameterization
REST API: Three new endpoints - POST/GET /api/thoughts for direct access, GET /api/thoughts/search for FTS5 queries, and POST /api/sessions/thoughts for session-scoped storage with auto-resolution
Vector Search: ChromaSync integrates thoughts into semantic search with syncThought/syncThoughts methods and backfill support, following the observation sync pattern
Search Integration: All search strategies (Chroma, SQLite, Hybrid) properly handle ThoughtSearchResult type with correct doc_type filtering
Real-time Updates: SSE broadcaster emits thought_stored events with 200-char previews for live viewer updates
Test Coverage: Comprehensive tests covering migrations, CRUD operations, hook extraction, API routes, Chroma sync, and SSE broadcasting

Implementation Quality:

The code demonstrates strong consistency with existing patterns - migration structure matches user_prompts table creation, FTS5 triggers follow the observations_fts pattern, and error handling aligns with hook philosophy (non-blocking, graceful degradation). The JSONL parser correctly handles malformed lines, and async operations (Chroma sync, SSE broadcast) are properly fire-and-forget to avoid blocking the response.

Confidence Score: 5/5

Safe to merge - well-structured feature addition with comprehensive tests and proper error handling
The implementation follows established codebase patterns consistently, includes thorough test coverage across all layers (migration, storage, API, search, SSE), handles errors gracefully without blocking core functionality, and demonstrates proper SQL parameterization and idempotent migrations. The code is production-ready.
No files require special attention - implementation is consistent and well-tested

Important Files Changed

Filename	Overview
src/services/sqlite/migrations/runner.ts	Added createThoughtsTable migration (v22) with proper idempotency checks
src/services/sqlite/SessionStore.ts	Added thoughts CRUD operations and FTS5 search, properly integrated into constructor
src/hooks/handlers/thinking.ts	JSONL parser extracts thinking blocks from transcript, handles malformed lines gracefully
src/hooks/handlers/thoughts.ts	Hook handler extracts thoughts and POSTs to worker with proper error handling
src/services/worker/http/routes/ThoughtsRoutes.ts	REST API for thought storage, retrieval, and FTS5 search with Chroma sync and SSE broadcast
src/services/worker/http/routes/SessionRoutes.ts	Added POST /api/sessions/thoughts endpoint with auto-resolution of session metadata
src/services/sync/ChromaSync.ts	Added syncThought/syncThoughts methods and backfill support for thoughts vector embeddings
src/services/worker/search/strategies/ChromaSearchStrategy.ts	Integrated thoughts into vector search with doc_type filtering and hydration

Sequence Diagram

sequenceDiagram
    participant Hook as Stop Hook
    participant Extract as thoughts-extract Handler
    participant Parser as extractThinkingBlocks
    participant Transcript as JSONL Transcript
    participant Worker as Worker API
    participant Store as SessionStore
    participant FTS as SQLite FTS5
    participant Chroma as ChromaSync
    participant SSE as SSE Broadcaster

    Hook->>Extract: Execute (Phase 1.5)
    Extract->>Parser: extractThinkingBlocks(transcriptPath)
    Parser->>Transcript: Read JSONL line-by-line
    Transcript-->>Parser: Raw transcript lines
    Parser->>Parser: Parse JSON, filter type='thinking'
    Parser-->>Extract: ThinkingBlock[] with text & messageIndex
    
    Extract->>Worker: POST /api/sessions/thoughts
    Note over Extract,Worker: Body: contentSessionId, thoughts[]
    
    Worker->>Store: createSDKSession(contentSessionId)
    Store-->>Worker: Resolve memorySessionId & project
    Worker->>Store: storeThoughts(memorySessionId, thoughts)
    Store->>FTS: INSERT triggers sync thoughts_fts
    Store-->>Worker: thought IDs[]
    
    Worker->>Store: getThoughtsByIds(ids)
    Store-->>Worker: Full Thought records
    
    par Async Operations
        Worker->>Chroma: syncThoughts(thoughts)
        Chroma->>Chroma: Create ChromaDocument[]
        Note over Chroma: doc_type='thought', metadata
        Chroma-->>Worker: Success (fire-and-forget)
    and
        Worker->>SSE: broadcastThoughtStored(thought)
        SSE->>SSE: Broadcast 'thought_stored' event
        Note over SSE: Preview: first 200 chars
    end
    
    Worker-->>Extract: { ids: number[] }
    Extract-->>Hook: HookResult (continue=true)

_{Last reviewed commit: 67c7b63}

thedotmack · 2026-02-16T05:47:16Z

The thoughts timeline feature looks comprehensive and well-structured. This PR has conflicts with main, particularly in worker-service.ts, ChromaSync.ts, and the search pipeline. Recommended merge order is: #1102 (bug fixes) first, then this PR. Could you rebase once #1102 lands?

Chriscross475

LGTM - Comprehensive thoughts timeline implementation with:

✅ Passing CI (Greptile + claude-review)
✅ Database migration (migration 21 with FTS5)
✅ Complete API layer (storage, retrieval, search)
✅ Vector integration (ChromaSync for semantic search)
✅ Real-time SSE (thought_stored broadcasts)
✅ Test coverage (9 new test files covering migration, storage, routes, SSE, Chroma)

Well-architected feature with clean separation of concerns. No TODO/FIXME markers found.

thedotmack and others added 18 commits February 7, 2026 20:56

MAESTRO: Add syncThoughts() batch method to ChromaSync for bulk thoug…

ce09d8b

…ht vector embeddings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MAESTRO: Add thought backfill to ensureBackfilled() for Chroma vector…

448d422

… search completeness Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MAESTRO: Add broadcastThoughtStored SSE method for real-time thought …

3edb2c8

…notifications Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MAESTRO: Rebuild and deploy plugin with thoughts SSE broadcasting int…

6b79d4f

…egration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into thoughts-feed

67c7b63

# Conflicts: # plugin/scripts/mcp-server.cjs # plugin/scripts/worker-service.cjs

Chriscross475 approved these changes Mar 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add thoughts timeline: capture, store, search & broadcast Claude thinking blocks#1083

Add thoughts timeline: capture, store, search & broadcast Claude thinking blocks#1083
thedotmack wants to merge 18 commits intomainfrom
thoughts-feed

thedotmack commented Feb 13, 2026

Uh oh!

claude bot commented Feb 13, 2026

Uh oh!

greptile-apps bot commented Feb 13, 2026

Uh oh!

thedotmack commented Feb 16, 2026

Uh oh!

Chriscross475 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

thedotmack commented Feb 13, 2026

Summary

Changes

Test plan

Uh oh!

claude bot commented Feb 13, 2026

PR Review: Thoughts Timeline Feature

✅ Strengths

1. Excellent Test Coverage

2. Clean Architecture

3. Robust Database Design

4. Error Handling

🔍 Issues & Recommendations

Critical: Privacy Tag Stripping Missing

High: Error Handling Inconsistency

Medium: Race Condition in SessionRoutes.handleStoreThoughts

Low: Migration Version Mismatch

Low: Potential Performance Consideration

Code Quality: Minor Improvements

📊 Test Coverage Assessment

🔒 Security Assessment

🎯 Recommendations Summary

Must Fix Before Merge

Should Fix Before Merge

Consider for Follow-up PR

✨ Overall Assessment

Uh oh!

greptile-apps bot commented Feb 13, 2026

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

thedotmack commented Feb 16, 2026

Uh oh!

Chriscross475 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants