-
Notifications
You must be signed in to change notification settings - Fork 906
Description
Problem
validateSemanticQuery() in src/store.ts rejects vec/hyde queries containing hyphens before words, treating them as negation operators. This causes false positives when queries contain CLI flags or hyphenated terms as part of natural language text.
Example query:
{"type": "vec", "query": "autopilot script failed because claude -p command did not work"}Error:
Structured search (vec): Negation (-term) is not supported in vec/hyde queries. Use lex for exclusions.
The -p here is a CLI flag reference, not a negation operator. Semantic search (embeddings) doesn't use negation syntax, so there's no reason to reject these queries.
Current behavior
export function validateSemanticQuery(query: string): string | null {
if (/-\w/.test(query) || /-"/.test(query)) {
return 'Negation (-term) is not supported in vec/hyde queries. Use lex for exclusions.';
}
return null;
}The regex /-\w/ matches any hyphen followed by a word character anywhere in the query — including mid-sentence CLI flags like -p, -v, compound words, etc.
Proposed fix
Replace validation with sanitization — strip negation-like prefixes instead of rejecting the query. Embeddings ignore punctuation anyway, so removing the hyphen has no effect on search quality:
export function sanitizeSemanticQuery(query: string): string {
return query.replace(/(?:^|\s)-(?=\w|")/g, (match) => match.startsWith(' ') ? ' ' : '');
}And in the caller, replace the throw with:
} else if (search.type === 'vec' || search.type === 'hyde') {
search.query = sanitizeSemanticQuery(search.query);
}This way "claude -p command" → "claude p command" — search works, no error.
Impact
This affects any MCP client (e.g. Claude Code) that generates vec/hyde queries with hyphens. Since the queries are LLM-generated, instructing the model to avoid hyphens is unreliable — the fix needs to be in code.