Add persistent embedding cache with rate limiting #679

Fuzzwah · 2025-11-28T06:29:52Z

Summary

Adds persistent caching for embeddings to dramatically improve startup time, especially when using rate-limited embedding APIs.

Changes

New Files

src/utils/embedding_cache.js: Persistent cache stored in ./bots/.cache/
- MD5 hash-based keys for content deduplication
- Model-aware invalidation (cache invalidates when embedding model changes)
- Version tracking for future cache format changes
src/utils/rate_limiter.js: Exponential backoff retry for rate limits
- Handles 429 errors with configurable retry delays (1s-60s)
- Parses retry_after headers when available
- embedWithProgress() helper for batch embedding with caching support

Modified Files

src/utils/examples.js: Use embedWithProgress with caching
- Adds optional cacheKey parameter to Examples constructor
- Example embeddings persist across restarts
src/agent/library/skill_library.js: Use embedWithProgress with caching
- Skill doc embeddings now cached to disk
- Significantly faster startup after first run

Benefits

Faster startup: Cached embeddings load instantly instead of re-computing
Rate limit handling: Automatic retry with exponential backoff
Model-aware: Cache auto-invalidates if you switch embedding models
Backward compatible: Works with all existing embedding model implementations

Testing

Tested with Replicate's mark3labs/embeddings-gte-base model which has strict rate limits. First run embeds and caches, subsequent runs load from cache in milliseconds.

- Add embedding_cache.js: Persistent cache stored in ./bots/.cache/ - MD5 hash-based keys for content deduplication - Model-aware invalidation (cache invalidates when model changes) - Version tracking for future cache format changes - Add rate_limiter.js: Exponential backoff retry for rate limits - Handles 429 errors with configurable retry delays - Parses retry_after headers when available - embedWithProgress() helper for batch embedding with caching - Update examples.js: Use embedWithProgress with caching - Adds cacheKey parameter to Examples constructor - Embeddings persist across restarts - Update skill_library.js: Use embedWithProgress with caching - Skill doc embeddings now cached to disk - Significantly faster startup after first run This dramatically improves startup time when using embedding models, especially with rate-limited APIs like Replicate.

Sweaterdog

Cached embeddings won't work, since every time a prompt is sent to the model it is different. It is very unlikely you will get the same example chosen multiple times. Exponential backoff is a good idea, but /src/models/prompter.js should be the file to get exponential backoff as that file handles the API requests

Sweaterdog · 2025-12-29T01:21:25Z

@Fuzzwah Can you explain why cached embeddings would be good in Mindcraft. You have until January to convince me, I don't want to leave the repo cluttered with useless pull requests

Fuzzwah · 2025-12-29T02:00:36Z

Mostly because it speeds up the bot start up, and I did not gather that the embeddings change run to run (unless you change the model being used).

Honestly though, I feel like my PRs have clearly explained what they do. Getting a deadline of 2 days to explain things again, especially at this time of year is pretty off putting.

If you don't appreciate my PRs feel free to just nuke them. I'll work in a fork if/when my interest in this pops up again.

No hard feelings from my side. I totally understand that maintaining a project is a thankless task at times. I don't want to be making anyone's life more difficult than it needs to be.

Ninot1Quyi · 2025-12-31T03:16:00Z

Mostly because it speeds up the bot start up, and I did not gather that the embeddings change run to run (unless you change the model being used).

Honestly though, I feel like my PRs have clearly explained what they do. Getting a deadline of 2 days to explain things again, especially at this time of year is pretty off putting.

If you don't appreciate my PRs feel free to just nuke them. I'll work in a fork if/when my interest in this pops up again.

No hard feelings from my side. I totally understand that maintaining a project is a thankless task at times. I don't want to be making anyone's life more difficult than it needs to be.

@Fuzzwah @Sweaterdog
I believe this strategy is effective because, in the current code, the documentation extracted from src/agent/library/skill_library.js re-requests embeddings on every startup. However, this content remains unchanged throughout the program’s execution and is primarily used for RAG-based relevance document retrieval.

One consideration is that we should set an expiration time for the cache. I’m currently unsure whether embedding models with the same name might be silently updated by the service provider, which could cause the locally cached embeddings to become outdated or incompatible.

Setting a cache expiration time of 24 hours could be a reasonable default. This duration should be configurable—users could extend it to 7 days or even longer if needed.

This approach ensures that non-developers won’t experience performance discrepancies due to model updates—even if they haven’t used the system for an extended period, their next use won’t suffer from mismatches between locally cached embeddings and the updated remote embeddings. Meanwhile, advanced users familiar with this mechanism can choose to customize and extend the cache duration as appropriate.

Sweaterdog requested changes Dec 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add persistent embedding cache with rate limiting #679

Add persistent embedding cache with rate limiting #679

Uh oh!

Fuzzwah commented Nov 28, 2025

Uh oh!

Sweaterdog left a comment

Uh oh!

Sweaterdog commented Dec 29, 2025

Uh oh!

Fuzzwah commented Dec 29, 2025

Uh oh!

Ninot1Quyi commented Dec 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add persistent embedding cache with rate limiting #679

Are you sure you want to change the base?

Add persistent embedding cache with rate limiting #679

Uh oh!

Conversation

Fuzzwah commented Nov 28, 2025

Summary

Changes

New Files

Modified Files

Benefits

Testing

Uh oh!

Sweaterdog left a comment

Choose a reason for hiding this comment

Uh oh!

Sweaterdog commented Dec 29, 2025

Uh oh!

Fuzzwah commented Dec 29, 2025

Uh oh!

Ninot1Quyi commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ninot1Quyi commented Dec 31, 2025 •

edited

Loading