Skip to content

fix(store): preserve underscores in BM25 search terms#404

Open
mvanhorn wants to merge 1 commit intotobi:mainfrom
mvanhorn:osc/305-bm25-underscore-search
Open

fix(store): preserve underscores in BM25 search terms#404
mvanhorn wants to merge 1 commit intotobi:mainfrom
mvanhorn:osc/305-bm25-underscore-search

Conversation

@mvanhorn
Copy link
Contributor

Fixes #305

Summary

sanitizeFTS5Term stripped underscores from search terms, causing BM25 searches for snake_case identifiers to silently fail. my_variable became myvariable, matching nothing.

Changes

  • Add _ to the preserved character set in sanitizeFTS5Term regex (store.ts)
  • Export the function for testability
  • Add 6 unit tests covering snake_case, contractions, punctuation, unicode

Before / After

sanitizeFTS5Term("my_variable")
  Before: "myvariable"  (no BM25 match)
  After:  "my_variable" (correct match)

The CLI's copy of sanitizeFTS5Term (cli/qmd.ts:1695) already uses \w which preserves underscores - this aligns the store's version.

This contribution was developed with AI assistance (Claude Code).

sanitizeFTS5Term stripped all non-letter/non-number characters including
underscores, causing snake_case identifiers like `my_variable` to become
`myvariable` and silently fail BM25 matches.

Add underscore to the preserved character set in the Unicode regex.
Export the function and add unit tests covering snake_case, contractions,
punctuation stripping, and unicode.

Fixes tobi#305

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BM25 search fails on snake_case identifiers (sanitizeFTS5Term strips underscores)

1 participant