-
Notifications
You must be signed in to change notification settings - Fork 640
Add persistent embedding cache with rate limiting #679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
- Add embedding_cache.js: Persistent cache stored in ./bots/.cache/ - MD5 hash-based keys for content deduplication - Model-aware invalidation (cache invalidates when model changes) - Version tracking for future cache format changes - Add rate_limiter.js: Exponential backoff retry for rate limits - Handles 429 errors with configurable retry delays - Parses retry_after headers when available - embedWithProgress() helper for batch embedding with caching - Update examples.js: Use embedWithProgress with caching - Adds cacheKey parameter to Examples constructor - Embeddings persist across restarts - Update skill_library.js: Use embedWithProgress with caching - Skill doc embeddings now cached to disk - Significantly faster startup after first run This dramatically improves startup time when using embedding models, especially with rate-limited APIs like Replicate.
Sweaterdog
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cached embeddings won't work, since every time a prompt is sent to the model it is different. It is very unlikely you will get the same example chosen multiple times. Exponential backoff is a good idea, but /src/models/prompter.js should be the file to get exponential backoff as that file handles the API requests
|
@Fuzzwah Can you explain why cached embeddings would be good in Mindcraft. You have until January to convince me, I don't want to leave the repo cluttered with useless pull requests |
|
Mostly because it speeds up the bot start up, and I did not gather that the embeddings change run to run (unless you change the model being used). Honestly though, I feel like my PRs have clearly explained what they do. Getting a deadline of 2 days to explain things again, especially at this time of year is pretty off putting. If you don't appreciate my PRs feel free to just nuke them. I'll work in a fork if/when my interest in this pops up again. No hard feelings from my side. I totally understand that maintaining a project is a thankless task at times. I don't want to be making anyone's life more difficult than it needs to be. |
@Fuzzwah @Sweaterdog One consideration is that we should set an expiration time for the cache. I’m currently unsure whether embedding models with the same name might be silently updated by the service provider, which could cause the locally cached embeddings to become outdated or incompatible. Setting a cache expiration time of 24 hours could be a reasonable default. This duration should be configurable—users could extend it to 7 days or even longer if needed. This approach ensures that non-developers won’t experience performance discrepancies due to model updates—even if they haven’t used the system for an extended period, their next use won’t suffer from mismatches between locally cached embeddings and the updated remote embeddings. Meanwhile, advanced users familiar with this mechanism can choose to customize and extend the cache duration as appropriate. |
Summary
Adds persistent caching for embeddings to dramatically improve startup time, especially when using rate-limited embedding APIs.
Changes
New Files
src/utils/embedding_cache.js: Persistent cache stored in./bots/.cache/src/utils/rate_limiter.js: Exponential backoff retry for rate limitsretry_afterheaders when availableembedWithProgress()helper for batch embedding with caching supportModified Files
src/utils/examples.js: UseembedWithProgresswith cachingcacheKeyparameter toExamplesconstructorsrc/agent/library/skill_library.js: UseembedWithProgresswith cachingBenefits
Testing
Tested with Replicate's
mark3labs/embeddings-gte-basemodel which has strict rate limits. First run embeds and caches, subsequent runs load from cache in milliseconds.