Skip to content

[BUG] embed_batch API Endpoint Has No Size Limit #103

@olddev94

Description

@olddev94

Project

vgrep

Description

The /embed_batch API endpoint in src/server/api.rs accepts an unlimited number of texts for embedding. A malicious or misconfigured client could send thousands of texts in a single request, causing memory exhaustion, extremely long processing times, or denial of service.

Error Message

No error - server becomes unresponsive or runs out of memory.

Debug Logs

System Information

- Bounty Version: 0.1.0
- OS: Ubuntu 24.04 LTS
- Rust: 1.75+

Screenshots

No response

Steps to Reproduce

  1. Start the server: vgrep serve
  2. Send a large batch request:
    # Generate a request with 10000 texts
    python3 -c "
    import json
    texts = ['sample text ' * 100] * 10000
    print(json.dumps({'texts': texts}))
    " | curl -X POST http://127.0.0.1:7777/embed_batch \
        -H "Content-Type: application/json" \
        -d @-
  3. Observe server becomes unresponsive or crashes with OOM

Expected Behavior

The API should:

  1. Limit batch size (e.g., max 100 texts per request)
  2. Limit individual text length
  3. Return 400 Bad Request when limits exceeded
  4. Document the limits

Actual Behavior

  1. No batch size limit
  2. No individual text length limit
  3. Server processes any size request
  4. Can be exploited for DoS

Additional Context

The internal batch size of 50 in indexer.rs suggests 50-100 is a reasonable limit. The absence of any validation makes this a potential attack vector:

  • Memory exhaustion from large text arrays
  • CPU exhaustion from processing many embeddings
  • Server becomes unresponsive to other clients

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingideIssues related to IDEinvalidThis doesn't seem rightvgrep

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions