Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Optimize caching policy for Request cache #16162

Open
sgup432 opened this issue Oct 1, 2024 · 2 comments
Open

[RFC] Optimize caching policy for Request cache #16162

sgup432 opened this issue Oct 1, 2024 · 2 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request RFC Issues requesting major changes Search:Performance

Comments

@sgup432
Copy link
Contributor

sgup432 commented Oct 1, 2024

Is your feature request related to a problem? Please describe

As of now, request cache in OpenSearch only caches aggregation queries by default where size=0 in search request. This was initially done as aggregation queries are the most expensive ones and we wanted to utilize existing on-heap cache much efficiently and not overload with other cheap queries which may hamper performance.

But there maybe other types of queries which are much more expensive but we are unable to utilize request cache due to its naive caching policy.
Also with the introduction of tiered caching, we can now cache much larger dataset and not just limited by the available on-heap cache size available on the node. So overall I think we should consider to change the default caching policy(to cache more query types) for request cache especially when tiered cache is used.

Also note that user do have a way to cache any other query by passing ?request_cache=true in search request but a lot of users might be unaware of this or not use it immediately as requires explicit change on their end due to which request cache is pretty much under utilized.

Describe the solution you'd like

We can start with having a query took time based caching policy where we can have a threshold such as X ms (configurable) where any query taking more than X ms will be cached onto request cache. We can use this specific caching policy when tiered caching is enabled and replace the old default one. This way we can cache more expensive queries and have a clear way to identify such queries instead of relying on naive decisions such as size=0.
Also there are other conditions where we don't cache queries, like when we use now as those are not deterministic, also we don't cache DFS query types etc, these will still be honored with the new caching policy.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

@sgup432 sgup432 added enhancement Enhancement or improvement to existing feature or request untriaged and removed untriaged labels Oct 1, 2024
@kkhatua
Copy link
Member

kkhatua commented Oct 1, 2024

I like the idea, but it might be dicey because the volume of data as an entry for a size > 0 in the cache is probably much more than what a size=0 query . It probably might be better to also have a threshold on a lower bound of the actual bytes to justify putting it in the disk tier (or else skip). If the impact is negligible, this threshold can be eliminated by setting it to 0

@sgup432
Copy link
Contributor Author

sgup432 commented Oct 1, 2024

@kkhatua Thanks for the suggestion. Request cache only works with query phase where we cache docIds and aggregation results. So I don't think that the volume of data will be pretty huge. But yeah having a size based policy also makes sense alongside time based policy, so we don't end up caching a very huge entry.

@sgup432 sgup432 changed the title [Feature Request] Optimize caching policy for Request cache [RFC] Optimize caching policy for Request cache Oct 1, 2024
@kkhatua kkhatua added the RFC Issues requesting major changes label Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request RFC Issues requesting major changes Search:Performance
Projects
Status: New
Status: 🆕 New
Development

No branches or pull requests

4 participants