Skip to content

[BUG] Project Database Hash Collision Risk #152

@olddev94

Description

@olddev94

Project

vgrep

Description

The Config::hash_path() function uses only the first 8 bytes (64 bits) of a SHA256 hash to generate unique database filenames for different projects. While 64 bits provides ~18 quintillion possibilities, the birthday paradox means collisions become increasingly likely as the number of projects grows. Two different project paths with a hash collision would share the same database, causing data corruption.

Error Message

No error - silent data corruption when collision occurs.

Debug Logs

System Information

- Bounty Version: 0.1.0
- OS: Ubuntu 24.04 LTS
- Rust: 1.75+

Screenshots

No response

Steps to Reproduce

  1. Find two paths that produce the same 8-byte SHA256 prefix (requires brute force or luck)
  2. Index project A at path X
  3. Index project B at path Y (where hash(X)[..8] == hash(Y)[..8])
  4. Project B's data overwrites Project A's data
  5. Search in Project A returns results from Project B

Expected Behavior

  1. Each project should have a guaranteed unique database file
  2. Hash collisions should be detected or impossible
  3. If collision occurs, warn user or use different naming scheme

Actual Behavior

  1. 64-bit hash provides ~1 in 2^32 collision chance after 2^32 projects
  2. Collisions cause silent database sharing/corruption
  3. No detection mechanism exists

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingvalidValid issuevgrep

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions