Skip to content

vec/hyde queries reject valid text containing hyphens (false positive negation detection) #390

@lunas-ai-lab

Description

@lunas-ai-lab

Problem

validateSemanticQuery() in src/store.ts rejects vec/hyde queries containing hyphens before words, treating them as negation operators. This causes false positives when queries contain CLI flags or hyphenated terms as part of natural language text.

Example query:

{"type": "vec", "query": "autopilot script failed because claude -p command did not work"}

Error:

Structured search (vec): Negation (-term) is not supported in vec/hyde queries. Use lex for exclusions.

The -p here is a CLI flag reference, not a negation operator. Semantic search (embeddings) doesn't use negation syntax, so there's no reason to reject these queries.

Current behavior

export function validateSemanticQuery(query: string): string | null {
  if (/-\w/.test(query) || /-"/.test(query)) {
    return 'Negation (-term) is not supported in vec/hyde queries. Use lex for exclusions.';
  }
  return null;
}

The regex /-\w/ matches any hyphen followed by a word character anywhere in the query — including mid-sentence CLI flags like -p, -v, compound words, etc.

Proposed fix

Replace validation with sanitization — strip negation-like prefixes instead of rejecting the query. Embeddings ignore punctuation anyway, so removing the hyphen has no effect on search quality:

export function sanitizeSemanticQuery(query: string): string {
  return query.replace(/(?:^|\s)-(?=\w|")/g, (match) => match.startsWith(' ') ? ' ' : '');
}

And in the caller, replace the throw with:

} else if (search.type === 'vec' || search.type === 'hyde') {
  search.query = sanitizeSemanticQuery(search.query);
}

This way "claude -p command""claude p command" — search works, no error.

Impact

This affects any MCP client (e.g. Claude Code) that generates vec/hyde queries with hyphens. Since the queries are LLM-generated, instructing the model to avoid hyphens is unreliable — the fix needs to be in code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions