Skip to content

[BUG] Token Truncation Is Silent, No Warning #162

@olddev94

Description

@olddev94

Project

vgrep

Description

When text exceeds the context size limit, tokens are silently truncated without any warning to the user. This can lead to:

  • Incomplete embeddings for long files/chunks
  • Misleading search results
  • User unaware that content was lost

Error Message

None - silent truncation.

Debug Logs

System Information

- Bounty Version: 0.1.0
- OS: Ubuntu 24.04 LTS
- Rust: 1.75+

Screenshots

No response

Steps to Reproduce

  1. Create a very long code file (>2048 tokens)
  2. Index the file: vgrep index
  3. Tokens beyond n_ctx are silently dropped
  4. Search only considers truncated content

Expected Behavior

  1. Log a warning when truncation occurs
  2. Track truncation statistics during indexing
  3. Option to fail on truncation instead of silently dropping
  4. Consider chunking strategy to avoid truncation

Actual Behavior

  1. Tokens silently dropped
  2. No indication to user
  3. No metrics on truncation rate
  4. Potentially misleading search results

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingvalidValid issuevgrep

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions