-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HybridCache - tags and invalidation #55308
Comments
@jodydonetti you might be interested in this; am I on entirely the wrong path here? would any of this be useful to you? is any of it "that's so close to being useful, but not quite" ? |
Hi @mgravell , I'll look into this later today, can't wait to see the details of this 😬 |
@sebastienros might also be of interest to you |
I understand this proposal is to receive invalidations from the distributed cache layer (aka L2) and would not support invalidations through Kafka or "SQL polling". That's an available feature on Redis but is that a common feature from other NoSQL database systems? With the HybridCache proposal (#53255), what exactly would be the blocker to implement a Redis channel-based invalidation mechanism? |
@verdie-g the entire point is to provide an abstraction upon which any L2 could provide invalidations, if it is so capable. The same thing would be possible in SQL polling by adding a couple of tables - one for the tag last-known expiration state, and one for some backlog of invalidations (the latter could be fed via a trigger or explicit code), and having a periodic timer that simply checks for new data (using an incrementing counter) - that could be every N seconds, configurable. There may also be some SQL mechanism involving SQL service-broker queues, but I don't think (from memory) that this has the correct multi-receiver semantics. I'm not a Kafka expert (I've simply never had cause to dabble), but assuming that Kafka queues support a mode that allows multiple recipients to receive the same message, such a queue could absolutely be used as the backbone for invalidation for an L2 backend that wanted to do so. For comparison, the same intent could also probably be described by Redis "streams", which are, IIRC, broadly similar to Kafka queues.
Not fully sure I understand the question, but the biggest blocker is the lack of an abstraction API that allows the L2 backend (currently described by |
Hi all, I've finally had time to think about this, so here are some notes.
This seems in general all good, fwiw it's what I'm already doing in FusionCache (in my case with the backplane) and is working great, so no notes here. In case it can help: in my case, by decoupling it from the distributed cache in a separate component, I've been able to have more control on some features and support scenarios where users want to use the backplane without an L2 (but I think it's a very niche scenario so probably not worth the effort of supporting it). It also allowed me to develop it and evolve it without asking each distributed cache maintainer to do what I needed. NOTE: even with 2 separate components it's still possible to reuse the same
FusionCache does not have a separate header to pass around, but it uses a
Here the "collection of tag evictions" and "cold-start loading" are the things that bother me the most, for 3 reasons:
Mind you, I'm not saying it will not work at all, but it has some subtle behaviours and imho it doesn't scale very well: I tried a similar approach in the past and although it seemingly kinda sorta worked, it was not totally reliable and that has led to surprises down the road. Also, the fact that the list of invalidated tags will realistically grow a lot runs the risk of it being hard to work with. I'm sharing this experience so that you won't be bitten by the same thing, and maybe can find a better solution (I'm still working on a different approach, but nothing concrete yet). A practical example: in a real world project of a reasonable size from a couple of years ago (with varied object types, a ton of object instances, projections being cached, millions of page views per day,etc) I started tracking the overall tags usage to get a grasp of what I would be going towards. That tracked usage was for tags used "realistically", meaning not using 1 or 2 of them, but for actual tag-invalidation related stuff, and the amount of them was non trivial (and, so, the amount of hypothetical tag invalidations stored and loaded). The usage scenario was something like this: caching entities alone (like caching "user 123" or "question 456") + caching lists of the same entities (or projections of them), with the idea being that when there was an update in an entity I would not only evict the cache entry for that entity itself, but also all the cache entries that contained the same entity or its projections, to avoid having the same entity turning up with multiple different versions in different cache entries. So basically:
Now, when updating user 123, I would not only update the cache on the current node + send an eviction notification for it on the backplane, but also "evict by tag" with tag = "user:123": this would have invalidated all the lists and projections and output caches etc. As you can imagine, if this is not used that much it does not generate a lot of tags (but becomes non that useful), whereas if is used for real it can generate a lot of tags. Combine this with the aforementioned cold-start load of tag invalidations, and there are problems waiting to happend. NOTE: of course I may have misunderstood the intended usage, or there's a specific design detail I'm missing, and in that case I'd like to be pointed at it. For completeness, the way I'm experimenting with it in FusionCache since last year is that (to me) it's not possible to get the desired result in a solid way by "emulating" an evict by tag, but the only way is to do it "for real": this most probably requires native support from bot levels (L1 and L2) and that, in turn, would require a new design/impl of both L1 and L2 abstractions. Sooo yeah, that is where I'm currently at: the experiment I'm working on from some time does not have a clear shape yet 🤷 Btw: any news on that new RCache? I've not seen any news for some time now...
What is the plan for when an For the But for something like tags which is not an optimization but a full blown feature, should it throw something like Probably a good approach (imho) is something on the On top of that, maybe we need some opt-in option that suppress errors about some features being unsupported, so that we can still work with it (think shared stuff, like output cache) and by opting-in to ignoring tag invalidation means accepting that we will not have active invalidation by tag, and basically have the "old" way. In this case users would still be able to work with the Opinions?
Cascading delete would be too much, but what about delete by tag? Do you find a native approach not realiably implementable? If so, why?
It would be interesting to have someone from the team that implemented it do a deep dive video to explain the approach, in light of going over the pros/cons of that approach and maybe finding a unified approach. |
Will the tag support be ready for GA? I am testing with preview 7 and the RemoveByTagAsync method doesn't look like it is implemented yet. This seems like a pretty useful feature. Thanks. aspnetcore/src/Caching/Hybrid/src/Internal/DefaultHybridCache.cs Lines 153 to 154 in 280c613
|
Background and Motivation
This is part of Epic:
IDistributedCache
updates in .NET 9The first wave (preview 4) delivers the basic infrastructure for multi-tier caching based on
IMemoryCache
(L1) andIDistributedCache
(L2), but it is intended to add 3 additional features that require L2 support:These features must be optional and implemented in a way that does not place fundamentally new demands on
IDistributedCache
- for example, the existing "set" API is simply an opaque string key and BLOB value. In particular, because we're talking about out-of-process data, it is not possible to useIChangeToken
for this purpose (although a consuming layer could choose to useIChangeToken
locally as part of responding to the themes of this proposal.Active key invalidation
Right now, invalidation occurs passively via time/expiration, or as write-thru side-effects from operations on the same node; in a multi-node scenario this is insufficient and does not account for subsequent update/delete by other nodes, which can lead to inconsistent L1 cache state.
To remedy this, the following event is proposed on a new interface:
The idea behind this API is that L2 backends may, through some mechanism (queuing, pub/sub, etc) broadcast key updates. When a node performs writes (
Set[Async](...)
/Remove[Async](...)
, it should (via that implementation-specific mechanism) also publish a global invalidation entry. In the case ofSet[Async]
, the firstHeaderBytes
bytes of the payload may optionally also be published. This invalidation mechanism will be used to invokeKeyInvalidated
at arbitrary times, for out-of-band invalidation notification. Emphasis:HeaderBytes
is set by the consumer (HybridCache
, etc), and indicates "if you're publishing: publish this much of the payload". The value may be zero, and/or the implementation may choose to ignore this and not include any payload metadata, just publishing the keys.Note that the use of
ReadOnlySpan<byte>
precludes the use ofAction<string, ...>
; this use was intentional, making the lifetime semantics ofheader
very clear - if it is needed beyond right now, the consumer must copy the value to somewhere they control. The nameDistributedCacheKeyInvalidation
is perhaps questionable.The intent here is that
HybridCache
(or other consumers) can subscribe toKeyInvalidated
, and respond accordingly. Thekey
here matches the samestring
as described bystring key
inSet[Async]
etc, noting that if the L2 has some configured namespace prefix, the L2 implementation is responsible for removing that key again, such that thekey
inKeyInvalidated
is the original key transmitted.The purpose of the
header
is to help avoid trivial removals. If theheader
received is empty, the invocation should be treated as a blind "delete", causing L1 removal. This is a fair default, but it is not assumed that implementations can automatically detect and avoid same-connection notifications, which means we must anticipate and avoid:KeyInvalidated
event on node X gets invoked with {ZZZ, ABC} for the same thing we just causedThe
header
allows us to avoid these last two steps; the implementation of this is consumer-dependent, but in the case ofHybridCache
, the payload sent to L2 will include a payload header that includes the creation timestamp and a disambiguation qualifier (which, along with the creation timestamp, essentially work like an ETag); by parsing these (which will be in the first few bytes):header
, or any unexpectedheader
size: blind delete from L1This provides a mechanism to communicate L2 invalidation to L1, and respond without causing self-invalidation, within the constraints of the data available to
IDistributedCache
.Tagging
Tagging is a new concept being introduced into
HybridCache
that does not historically exist inIDistributedCache
. To achieve this, the L2 tag metadata will also be stored as part of the header (although not typically in the bytes published forKeyInvalidated
).It is assumed that
IDistributedCache
cannot reliably implement cascading delete at the backend - this is simply not a feature in many key/value stores, and while it can be hacked in: it is usually unsatisfying and requires significant additional overhead. We want to avoid this complexity in the backend.Consequently,
HybridCache
must implement this internally, by maintaining a lookup of each tag to the last known invalidation date, for example we might have (using numbers instead of dates here for simplicity):When loading an entry from L1 or L2, if that entry has tags we must compare the creation date of the cache entry (again, from the payload header) to the dates in each of the tags; if any tag has an invalidation date greater than the cache entry's creation date, it is considered logically expired (it can also be removed from L1/L2 accordingly). For example:
To support this, we still need some additional backend capabilities:
KeyInvalidated
For this,, we propose:
At cold-start, the library can use
GetTagsAsync
to pre-populate the tag lookup with some reasonable time bound, and can respond toTagInvalidated
to update this data (forwards-only) as needed. The choice of array here is intentional, as it is assumed the caller will be constructing their own lookup by iterating the data, hence "simple" is reasonable. This could arguably beIAsyncEnumerable<>
, etc, but: it is only used for cold-start population of the tag metadata, so array overhead is not burdensome.To invalidate a specific tag, we call
RemoveByTagAsync
, which would update the data used byGetTagsAsync
and also indirectly causeTagInvalidated
to be invoked by all clients. Note that unlike write-thru invalidation, tag invalidation doesn't have the problem of invalidating our own data, as it is not entry-specific.Combining these two halves, we get the API proposal:
Example implementation
Redis:
(without any special server features)
__MSFT_DC__KeyInvalidation
and__MSFT_DC__TagInvalidation
__MSFT_DC_Tags
as a sorted-setUNLINK {key}
plusPUBLISH __MSFT_DC__KeyInvalidation {key}
SET {key} {value} EX {ttl}
plusPUBLISH __MSFT_DC__KeyInvalidation {key+header}
ZADD __MSFT_DC_Tags GT {time} {tag}
plusPUBLISH __MSFT_DC__TagInvalidation {tag+time}
(on servers before 6.2, do not useGT
)ZRANGE __MSFT_DC_Tags {since} +inf BYSCORE WITHSCORES
ZSCORE __MSFT_DC_Tags {tag}
The implementation may also choose to use
ZREMRANGEBYSCORE __MSFT_DC_Tags -inf {culltime}
periodically, for someculltime
that represents the largest possible expiration; this allows long-dead tags to be forgotten.The channel and tags name should include any namespace partition configured, just like keys. The published tags/keys do not need to include the partition.
It is also possible to use server-assisted client-side caching or keyspace notifications, which may be considered in due course, but initially: active invalidation (i.e. where our code explicitly causes the pub/sub events) is described for simplicity, since this does not require server-side feature configuration (which is required for keyspace notifications) or an up-level server (server-assisted client-side caching requires server version 6 and client library support)
Alternative designs
The "output caching" feature is comparable in terms of supporting L2 tagging (without notification); because the concept of tags was baked into the original API, it is implemented in the backend - in SQL via relational DB semantics, and in Redis by using a
SADD {tag} {key}
such thattag
is the redis "set" consisting of all keys associated with that tag; deleting a tag means enumerating the "set" and callingUNLINK
per key, and also requires complicated periodic garbage collection to remove expired keys from each set - and to do that, we need an additional set which is the set of all known tags. The solution proposed here is much simpler to implement, and fits within the current API. So much so that whenHybridCache
has full tag support, I wonder if it is worth exploring a mechanism to implementIOutputCacheBufferStore
on top ofHybridCache
.The text was updated successfully, but these errors were encountered: