Skip to content

Conversation

@Fuzzwah
Copy link

@Fuzzwah Fuzzwah commented Nov 28, 2025

Summary

Adds persistent caching for embeddings to dramatically improve startup time, especially when using rate-limited embedding APIs.

Changes

New Files

  • src/utils/embedding_cache.js: Persistent cache stored in ./bots/.cache/

    • MD5 hash-based keys for content deduplication
    • Model-aware invalidation (cache invalidates when embedding model changes)
    • Version tracking for future cache format changes
  • src/utils/rate_limiter.js: Exponential backoff retry for rate limits

    • Handles 429 errors with configurable retry delays (1s-60s)
    • Parses retry_after headers when available
    • embedWithProgress() helper for batch embedding with caching support

Modified Files

  • src/utils/examples.js: Use embedWithProgress with caching

    • Adds optional cacheKey parameter to Examples constructor
    • Example embeddings persist across restarts
  • src/agent/library/skill_library.js: Use embedWithProgress with caching

    • Skill doc embeddings now cached to disk
    • Significantly faster startup after first run

Benefits

  • Faster startup: Cached embeddings load instantly instead of re-computing
  • Rate limit handling: Automatic retry with exponential backoff
  • Model-aware: Cache auto-invalidates if you switch embedding models
  • Backward compatible: Works with all existing embedding model implementations

Testing

Tested with Replicate's mark3labs/embeddings-gte-base model which has strict rate limits. First run embeds and caches, subsequent runs load from cache in milliseconds.

- Add embedding_cache.js: Persistent cache stored in ./bots/.cache/
  - MD5 hash-based keys for content deduplication
  - Model-aware invalidation (cache invalidates when model changes)
  - Version tracking for future cache format changes

- Add rate_limiter.js: Exponential backoff retry for rate limits
  - Handles 429 errors with configurable retry delays
  - Parses retry_after headers when available
  - embedWithProgress() helper for batch embedding with caching

- Update examples.js: Use embedWithProgress with caching
  - Adds cacheKey parameter to Examples constructor
  - Embeddings persist across restarts

- Update skill_library.js: Use embedWithProgress with caching
  - Skill doc embeddings now cached to disk
  - Significantly faster startup after first run

This dramatically improves startup time when using embedding models,
especially with rate-limited APIs like Replicate.
Copy link
Contributor

@Sweaterdog Sweaterdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cached embeddings won't work, since every time a prompt is sent to the model it is different. It is very unlikely you will get the same example chosen multiple times. Exponential backoff is a good idea, but /src/models/prompter.js should be the file to get exponential backoff as that file handles the API requests

@Sweaterdog
Copy link
Contributor

@Fuzzwah Can you explain why cached embeddings would be good in Mindcraft. You have until January to convince me, I don't want to leave the repo cluttered with useless pull requests

@Fuzzwah
Copy link
Author

Fuzzwah commented Dec 29, 2025

Mostly because it speeds up the bot start up, and I did not gather that the embeddings change run to run (unless you change the model being used).

Honestly though, I feel like my PRs have clearly explained what they do. Getting a deadline of 2 days to explain things again, especially at this time of year is pretty off putting.

If you don't appreciate my PRs feel free to just nuke them. I'll work in a fork if/when my interest in this pops up again.

No hard feelings from my side. I totally understand that maintaining a project is a thankless task at times. I don't want to be making anyone's life more difficult than it needs to be.

@Ninot1Quyi
Copy link
Contributor

Ninot1Quyi commented Dec 31, 2025

Mostly because it speeds up the bot start up, and I did not gather that the embeddings change run to run (unless you change the model being used).

Honestly though, I feel like my PRs have clearly explained what they do. Getting a deadline of 2 days to explain things again, especially at this time of year is pretty off putting.

If you don't appreciate my PRs feel free to just nuke them. I'll work in a fork if/when my interest in this pops up again.

No hard feelings from my side. I totally understand that maintaining a project is a thankless task at times. I don't want to be making anyone's life more difficult than it needs to be.

@Fuzzwah @Sweaterdog
I believe this strategy is effective because, in the current code, the documentation extracted from src/agent/library/skill_library.js re-requests embeddings on every startup. However, this content remains unchanged throughout the program’s execution and is primarily used for RAG-based relevance document retrieval.

One consideration is that we should set an expiration time for the cache. I’m currently unsure whether embedding models with the same name might be silently updated by the service provider, which could cause the locally cached embeddings to become outdated or incompatible.

Setting a cache expiration time of 24 hours could be a reasonable default. This duration should be configurable—users could extend it to 7 days or even longer if needed.

This approach ensures that non-developers won’t experience performance discrepancies due to model updates—even if they haven’t used the system for an extended period, their next use won’t suffer from mismatches between locally cached embeddings and the updated remote embeddings. Meanwhile, advanced users familiar with this mechanism can choose to customize and extend the cache duration as appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants