Skip to content

[BUG] Chunk Overlap Calculation Uses Incorrect Magic Number Division #196

@Crimsonyx412

Description

@Crimsonyx412

Project

vgrep

Description

The chunk overlap calculation in both Indexer and ServerIndexer divides chunk_overlap by 40, resulting in minimal effective overlap regardless of the configured value. With the default chunk_overlap = 64, the calculation 64 / 40 = 1 produces only 1 line of overlap between chunks.

Error Message

N/A - Logic error, no runtime error produced.

Debug Logs

N/A - Can be observed by indexing a file and examining chunk boundaries in the database:
sqlite3 ~/.vgrep/projects/*.db "SELECT start_line, end_line FROM chunks ORDER BY start_line;"

System Information

vgrep version: 0.1.0
Default chunk_overlap config: 64

Screenshots

No response

Steps to Reproduce

  1. Set chunk_overlap = 64 (default) in config
  2. Create a test file with 100+ lines of code
  3. Run vgrep index
  4. Query the database: SELECT start_line, end_line FROM chunks ORDER BY chunk_index;
  5. Observe that consecutive chunks only overlap by 1 line

Expected Behavior

Chunks should overlap by a meaningful amount based on the configured chunk_overlap value (64 characters by default), preserving context between chunk boundaries for better semantic search quality.

Actual Behavior

Chunks overlap by only 1 line regardless of the chunk_overlap configuration, due to integer division chunk_overlap / 40 = 64 / 40 = 1.

Additional Context

Files affected:

  • src/core/indexer.rschunk_content() (lines 354-358)
  • src/watcher.rschunk_content() (lines 418-422)

Problematic code:

let overlap_start = if line_idx > 0 {
    line_idx.saturating_sub(self.chunk_overlap / 40)  // 64 / 40 = 1
} else {
    0
};

Fix direction: Remove the arbitrary division or use a proper line-count calculation based on average line length.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinginvalidThis doesn't seem right

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions