Replies: 1 comment
-
One problem with the global resource cache is that it needs to be async. It can't just destroy resources in its own execution context, as this might cause races with the provider's. So in order to free a resource it needs to notify the submitter. Now this also means that a requester for space, can't just have a synchronous function, but it needs to be an asynchronous one. And this also means that the destroyer of a resource needs to notify the cache that the resource has been freed. This complicates the integration a lot. Can we have something simpler? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Application have multiple sessions with the same model which start and stop. We don't want to load the same model for every session. Ideally we want to cache the model.
Plugin-level cache
This only makes sense if this is the only plugin on the system. If we have multiple plugins, they will have no way of communicating between each other and even though, say,
ilib-whisper
has a stale model that it can free,ilib-llama
has no way of telling it to do so.We want to have a central SDK-level cache which can free items when needed.
Plain LRU
Plain LRU is too restrictive. If the system can load multiple models it a shame to only have a single slot in the cache. Especially for apps which need two or more models. Such apps will end up thrashing the cache on every run.
Multi-element cache
Ok, so we will have this, but the problem is then, how do we know what to free when a plugin requests space? In the simplest case if we have several CPU-RAM and several GPU-Memory models and resource space is requested, how do we know which ones to free.
ilib-whisper-cuda-0
andilib-llama-vulkan-1
are the same thing (this can be learned based on the fact that freeing space in ilib-whisper-cuda-0led to enough space in
ilib-llama-vulkan-1`For now we will go with 2. and maybe do 3. in the future. 1. is for the distant future if ever.
Beta Was this translation helpful? Give feedback.
All reactions