Skip to content

[BUG] Magic number 40 in chunk overlap calculation produces incorrect overlap #110

@Cute0110

Description

@Cute0110

Project

vgrep

Description

The chunk overlap calculation in the indexer uses a hardcoded magic number 40 that doesn't properly respect the configured chunk_overlap setting. This affects semantic search quality by not providing proper context overlap between chunks.

Error Observation

When indexing files with chunk_overlap = 64 (default), the actual overlap is only 1 line (64 / 40 = 1). When set to 128, it's only 3 lines (128 / 40 = 3). The config name suggests character-based overlap, but the magic number produces inconsistent results.

Error Message

Debug Logs

System Information

Version: 0.1.0

## Operating System
  OS: Ubuntu 24.04.3 LTS
  Kernel: 6.8.0-79-generic
  Arch: x86_64

## Hardware
  CPU: AMD Ryzen 9 5950X 16-Core Processor (4 cores)
  RAM: 11 GB

## Build Environment
  Rust: rustc 1.92.0 (ded5c06cf 2025-12-08)
  Target: x86_64

Screenshots

No response

Steps to Reproduce

  1. Open src/core/indexer.rs and examine lines 354-358, 730-734
  2. Open src/watcher.rs and examine line 419
  3. Observe the code:
let overlap_start = if line_idx > 0 {
    line_idx.saturating_sub(self.chunk_overlap / 40)  // Magic number 40!
} else {
    0
};

### Expected Behavior

The overlap calculation should either:
1. Use the configured value directly as line count
2. Calculate based on actual character positions
3. Document what the magic number 40 represents

### Actual Behavior

Division by unexplained magic number 40 produces minimal overlap regardless of configuration, degrading semantic search quality.

### Additional Context

The magic number appears in 3 different locations with no documentation explaining its purpose.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingideIssues related to IDEinvalidThis doesn't seem rightvgrep

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions