Skip to content

chleosl/claude-code-relevant-chat-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Claude Code Relevant Session Search

A pipeline for searching and analyzing previous Claude Code sessions to find relevant conversations using Haiku subagent delegation.

Overview

When working on long-term projects with Claude Code, valuable context gets scattered across multiple sessions. This tool enables semantic search across your Claude Code history, leveraging Haiku as a lightweight analyzer to find and summarize relevant past conversations.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Main Model                              │
│                    (Sonnet/Opus)                                │
└─────────────────────┬───────────────────────────────────────────┘
                      │ Task tool invocation
                      ▼
┌─────────────────────────────────────────────────────────────────┐
│              relevant-search Subagent                           │
│                                                                 │
│  1. Collect session file list (Bash: find)                     │
│  2. For each session:                                          │
│     └─→ Bash: python session_analyzer.py                       │
│         └─→ jq: extract .message.content[].text/thinking       │
│         └─→ Haiku API: analyze relevance                       │
│         └─→ stdout: session report                             │
│  3. Accumulate results                                         │
│  4. Stop-point check (every N sessions)                        │
│  5. Return consolidated report to main model                   │
└─────────────────────────────────────────────────────────────────┘

Why This Design?

Problem: Subagents cannot spawn other subagents in Claude Code.

Solution: Delegate per-session analysis to an external Python script that calls Haiku API independently. Each script execution is a separate process with isolated context, preventing token accumulation across sessions.

Components

1. Subagent Definition

~/.claude/agents/relevant-search.md

---
name: relevant-search
description: Search previous Claude Code sessions for relevant conversations. Use when user mentions "previous session", "we discussed before", "find where we talked about", etc.
tools: Bash, Read, Glob, Grep
---

# Relevant Session Search Agent

## Role
Search and analyze previous Claude Code sessions to find conversations relevant to the user's query.

## Process

1. Receive search query from main model
2. Determine search scope:
   - Directory: current project or all projects
   - Date range: default last 30 days
   - Max sessions: default 20

3. Collect target session files:
```bash
   find ~/.claude/projects/[PROJECT_DIR] -name "*.jsonl" -mtime -30 -type f | sort -r
```

4. For each session, invoke analyzer:
```bash
   python ~/.claude/scripts/session_analyzer.py \
     --session "[SESSION_PATH]" \
     --query "[SEARCH_QUERY]"
```

5. Accumulate results in memory

6. Stop-point check (every 5 sessions):
   - If sufficient relevant content found → prepare final report
   - If insufficient → continue iteration
   - If scope exhausted → suggest expanding scope or conclude

7. Return consolidated report to main model

## Output Format

For each relevant session found:
- Session ID and timestamp
- Session context (what was this session about overall)
- Relevant content summary
- Original quotes

## Scope Control

User can specify:
- `--dir [PROJECT_PATH]`: specific project directory
- `--days [N]`: date range in days
- `--max [N]`: maximum sessions to analyze
- `--exhaustive`: search all available sessions

2. Session Analyzer Script

~/.claude/scripts/session_analyzer.py

#!/usr/bin/env python3
"""
Analyze a single Claude Code session for relevance to a search query.
Calls Haiku API for semantic analysis.
"""

import argparse
import json
import subprocess
import os
from anthropic import Anthropic

def extract_session_content(session_path: str) -> str:
    """Extract text and thinking content from session JSONL using jq."""
    jq_filter = '''
    .message.content[]? | 
    select(.type == "text" or .type == "thinking") | 
    .text // .thinking
    '''
    
    result = subprocess.run(
        ['jq', '-r', jq_filter, session_path],
        capture_output=True,
        text=True
    )
    
    return result.stdout

def analyze_with_haiku(content: str, query: str, session_id: str) -> dict:
    """Send content to Haiku for relevance analysis."""
    
    client = Anthropic()
    
    prompt = f"""Analyze this Claude Code session transcript and determine if it contains content relevant to the following search query.

<search_query>
{query}
</search_query>

<session_transcript>
{content[:150000]}  # Truncate to fit context
</session_transcript>

If relevant content exists, respond in this JSON format:
{{
    "relevant": true,
    "session_context": "Brief description of what this session was about overall",
    "relevant_summary": "Summary of content related to the search query",
    "quotes": ["Direct quote 1", "Direct quote 2"]
}}

If no relevant content exists:
{{
    "relevant": false,
    "session_context": "Brief description of what this session was about overall"
}}

Respond with JSON only, no other text."""

    response = client.messages.create(
        model="claude-haiku-4-5-20241022",
        max_tokens=2000,
        messages=[{"role": "user", "content": prompt}]
    )
    
    try:
        return json.loads(response.content[0].text)
    except json.JSONDecodeError:
        return {"relevant": False, "error": "Failed to parse response"}

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--session', required=True, help='Path to session JSONL file')
    parser.add_argument('--query', required=True, help='Search query')
    args = parser.parse_args()
    
    session_id = os.path.basename(args.session).replace('.jsonl', '')
    
    # Extract content
    content = extract_session_content(args.session)
    
    if not content.strip():
        print(json.dumps({"session_id": session_id, "relevant": False, "error": "Empty session"}))
        return
    
    # Analyze with Haiku
    result = analyze_with_haiku(content, args.query, session_id)
    result["session_id"] = session_id
    
    # Get timestamp from file
    stat = os.stat(args.session)
    result["timestamp"] = stat.st_mtime
    
    print(json.dumps(result))

if __name__ == "__main__":
    main()

3. Installation

# Create directories
mkdir -p ~/.claude/agents
mkdir -p ~/.claude/scripts

# Copy agent definition
cp relevant-search.md ~/.claude/agents/

# Copy and make script executable
cp session_analyzer.py ~/.claude/scripts/
chmod +x ~/.claude/scripts/session_analyzer.py

# Install dependencies
pip install anthropic

# Ensure jq is installed
# Ubuntu/Debian: sudo apt install jq
# macOS: brew install jq

# Set up authentication (choose one)
export ANTHROPIC_API_KEY="your-api-key"
# OR use OAuth token from Claude Code subscription
# claude setup-token
# export CLAUDE_CODE_OAUTH_TOKEN="your-oauth-token"

Usage

Automatic Invocation

The subagent is automatically invoked when you mention past sessions:

You: "Find where we discussed the coordinate transformation methods"
Claude: [Invokes relevant-search subagent]
        [Returns consolidated report of matching sessions]

Explicit Invocation

You: "Use the relevant-search agent to find discussions about Firebase authentication from the last 60 days"

With Scope Control

You: "Search my trimfh project sessions from November for layout editor discussions"

Session JSONL Structure

Claude Code stores sessions in ~/.claude/projects/[encoded-path]/[session-id].jsonl

Each line is a JSON object with:

{
  "type": "user" | "assistant",
  "sessionId": "uuid",
  "timestamp": "ISO-8601",
  "message": {
    "content": [
      {"type": "text", "text": "..."},
      {"type": "thinking", "thinking": "..."},  // CoT
      {"type": "tool_use", "name": "...", "input": {...}}
    ]
  }
}

The analyzer extracts text and thinking fields for semantic search.

Limitations

  • No nested subagents: Claude Code subagents cannot spawn other subagents. This pipeline works around this by using Bash to call external scripts.
  • Context window: Each session's extracted content must fit in Haiku's context (~200K tokens). Very long sessions are truncated.
  • API costs: Each session analysis requires one Haiku API call.

Future Improvements

  • Pre-indexing with embeddings for faster initial filtering
  • Caching analyzed sessions to avoid re-processing
  • Support for searching across multiple projects simultaneously
  • Integration with /resume for one-click session continuation

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors