Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HybridCache : implement the tag expiration feature #5785

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

mgravell
Copy link
Member

@mgravell mgravell commented Jan 9, 2025

(note: this is a re-do of #5748 / #5781 which went "a bit wrong" when rebasing)

Implement tag expiration in HybridCache

The HybridCache system has a "tags" feature; rather than building a bespoke per-backend secondary index (like how the redis OutputCache backend works), here we move everything into the core library:

  • in the DefaultHybridCache, we maintain a map of tags (string, note * means everything) to expirations, and we track the creation time against each CacheItem
  • when fetching items from L1 (in-proc) cache, we check the item's creation data against the cache's wildcard and per-tag expirations; if the creation date is earlier than any of these: it is treated as disallowed
  • to allow persistence of invalidation between sessions when a L2 (backend cache) is in play, we additionally store the timeout data via additional simple values in the cache
  • we also pre-fetch any missing tag data from L2 as-needed, and enforce the previous expirations
  • to facilitate this, and allow reliability that the data we are parsing is the data requested: the data stored in L2 is now embedded inside a payload that includes the creation timestamp, duration, key, tags, etc as well as the payload
  • in the rare event that the L2 data contains additional tags to those advertised by the caller: these additional tags are also fetched and enforced

Because the tag fetches are async, the data in the lookup is not (timestamp), it is Task<(timestamp)>

Outstanding:

  • additional logging
    • add log when L2 data is rejected for (reason)
    • add metric of the number of tag expirations being tracked in L1
  • resolve netfx test failure (actually: niche timing problem in tag invalidation)
  • address option member inheritance (or not - per @jodydonetti comments here)
Microsoft Reviewers: Open in CodeFlow

@jodydonetti
Copy link
Contributor

jodydonetti commented Jan 9, 2025

Note

Reposting my comment here from there for ease of discussion.

Hi Marc, not strictly related to tagging, but I'm looking at some code change in the PR and I spotted this xml comment regarding HybridCacheEntryOptions:

If options are specified at the individual call level, the non-null values are merged
(with the per-call options being used in preference to the global options). If no value is
specified for a given option (globally or per-call), the implementation can choose a reasonable default.

When we touched on this subject some time ago I remember we agreed on the fact that inheritance (from per-call up to the DefaultEntryOptions) would have to be done on an all-or-nothing approach, and not on a per-single-option approach.
What I mean is that "if null is passed as an HybridCacheEntryOptions param" than it would fall back to the DefaultEntryOptions, but single options inside that object (like Expiration, LocalCacheExpiration) should not have inheritance of values.

All of this was because of the impossibility to express "undefined" or, to better say it, the impossibility to differentiate between null meaning "I'm saying null because I want a local fallback" (for example from LocalCacheExpiration to Expiration) and null meaning "I'm saying null because I want a global fallback" (for example to the same option in the DefaultEntryOptions").

A practical example of a problem this may create: in the DefaultEntryOptions I set Expiration to 5min and LocalCacheExpiration to 10s to refresh L1 more frequently. Then in a certain call I pass a specific HybridCacheEntryOptions instance where I set Expiration to 1h and leave LocalCacheExpiration to null, thinking it will be the same as Expiration, when in reality it will become 10s. Because of this, there will be for me no way to keep the 2 options in-sync at every call, unless I keep specifying the same value for both Expiration and LocalCacheExpiration at every call.

Can you clarify what is the current rationale here?

I'm asking for 2 reasons: first I'm interested in general, and second because as you know I'm creating a 3rd party implementation, and I'd like the observable external behaviour to be the same for both implementations.

Thanks!

ps: if you like I can open a different issue for this, just let me know.
pps: will post more about the tagging implementation as soon as I'll finish reading the massive code changes.

@dotnet-comment-bot
Copy link
Collaborator

‼️ Found issues ‼️

Project Coverage Type Expected Actual
Microsoft.Extensions.Caching.Hybrid Line 86 84.14 🔻
Microsoft.Extensions.Caching.Hybrid Branch 86 84.49 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Extensions.AI 88 89

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=912260&view=codecoverage-tab

@jodydonetti
Copy link
Contributor

An extra point to think about, again related the entry options inheritance thing, is how this method (marked internal, and maybe it will not be used in the end) would be affected.

@BrennanConroy
Copy link
Member

to allow persistence of invalidation between sessions when a L2 (backend cache) is in play, we additionally store the timeout data via additional simple values in the cache

Shouldn't we be concerned about machine clocks not being synchronized? i.e. Machine 1 says it's 10:05:03 and adds a timeout for 10:10:03, but Machine 2 says it's 10:12:05 and when it reads the cache item it will determine it to be invalid.

@jodydonetti
Copy link
Contributor

jodydonetti commented Jan 10, 2025

Shouldn't we be concerned about machine clocks not being synchronized? i.e. Machine 1 says it's 10:05:03 and adds a timeout for 10:10:03, but Machine 2 says it's 10:12:05 and when it reads the cache item it will determine it to be invalid.

Eh, welcome to the world of clock shifting/drifting/skewing & friends: it's not something that's solvable via software, at least not at this level (to the best of my knowledge).

Just to make an example: typically Microsoft/Meta/Google/AWS have their own private atomic clocks around the world to avoid this problem (... as much as possible), usually dedicated to specific services (like Google Spanner with TrueTime).

Some interesting reads:

@dotnet-comment-bot
Copy link
Collaborator

‼️ Found issues ‼️

Project Coverage Type Expected Actual
Microsoft.Extensions.Caching.Hybrid Line 86 78.59 🔻
Microsoft.Extensions.Caching.Hybrid Branch 86 82.65 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Extensions.AI 88 89

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=913450&view=codecoverage-tab

@mgravell
Copy link
Member Author

mgravell commented Jan 10, 2025 via email

@dotnet-comment-bot
Copy link
Collaborator

‼️ Found issues ‼️

Project Coverage Type Expected Actual
Microsoft.Extensions.Caching.Hybrid Line 86 78.59 🔻
Microsoft.Extensions.Caching.Hybrid Branch 86 82.65 🔻
Microsoft.Gen.MetadataExtractor Line 98 57.35 🔻
Microsoft.Gen.MetadataExtractor Branch 98 62.5 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Extensions.AI 88 89
Microsoft.Extensions.AI.OpenAI 77 78

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=931777&view=codecoverage-tab

@dotnet-comment-bot
Copy link
Collaborator

‼️ Found issues ‼️

Project Coverage Type Expected Actual
Microsoft.Extensions.Caching.Hybrid Line 86 84.35 🔻
Microsoft.Extensions.Caching.Hybrid Branch 86 84 🔻
Microsoft.Gen.MetadataExtractor Line 98 57.35 🔻
Microsoft.Gen.MetadataExtractor Branch 98 62.5 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Extensions.AI.OpenAI 77 78
Microsoft.Extensions.AI 88 89

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=931922&view=codecoverage-tab

@dotnet-comment-bot
Copy link
Collaborator

‼️ Found issues ‼️

Project Coverage Type Expected Actual
Microsoft.Extensions.AI.Ollama Line 80 77.96 🔻
Microsoft.Extensions.Caching.Hybrid Line 86 78.19 🔻
Microsoft.Extensions.Caching.Hybrid Branch 86 82.83 🔻
Microsoft.Gen.MetadataExtractor Line 98 57.35 🔻
Microsoft.Gen.MetadataExtractor Branch 98 62.5 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Extensions.AI.OpenAI 77 78

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=933691&view=codecoverage-tab

@jodydonetti
Copy link
Contributor

jodydonetti commented Jan 29, 2025

Hi Marc, I'm looking at this and wow, those are a lot of tests 💪

Just to doublecheck so we are on the same page: the idea is still this?

ps: the amount of work you are doing on HC is spectacular, kudos.

@dotnet-comment-bot
Copy link
Collaborator

‼️ Found issues ‼️

Project Coverage Type Expected Actual
Microsoft.Extensions.AI.Ollama Line 80 78.25 🔻
Microsoft.Extensions.Caching.Hybrid Line 86 79.42 🔻
Microsoft.Extensions.Caching.Hybrid Branch 86 83.84 🔻
Microsoft.Gen.MetadataExtractor Line 98 57.35 🔻
Microsoft.Gen.MetadataExtractor Branch 98 62.5 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Extensions.AI.OpenAI 77 78
Microsoft.Extensions.AI 88 89
Microsoft.Extensions.AI.Abstractions 83 84

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=934881&view=codecoverage-tab

@dotnet-comment-bot
Copy link
Collaborator

‼️ Found issues ‼️

Project Coverage Type Expected Actual
Microsoft.Extensions.Caching.Hybrid Line 86 84.72 🔻
Microsoft.Extensions.Caching.Hybrid Branch 86 84.34 🔻
Microsoft.Extensions.AI.Ollama Line 80 78.25 🔻
Microsoft.Gen.MetadataExtractor Line 98 57.35 🔻
Microsoft.Gen.MetadataExtractor Branch 98 62.5 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Extensions.AI.OpenAI 77 78
Microsoft.Extensions.AI.Abstractions 83 84
Microsoft.Extensions.AI 88 89

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=934939&view=codecoverage-tab

@dotnet-comment-bot
Copy link
Collaborator

‼️ Found issues ‼️

Project Coverage Type Expected Actual
Microsoft.Extensions.AI.Ollama Line 80 78.25 🔻
Microsoft.Gen.MetadataExtractor Line 98 57.35 🔻
Microsoft.Gen.MetadataExtractor Branch 98 62.5 🔻
Microsoft.Extensions.Caching.Hybrid Line 86 85.48 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Extensions.AI.OpenAI 77 78
Microsoft.Extensions.AI 88 89
Microsoft.Extensions.AI.Abstractions 83 84

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=935018&view=codecoverage-tab

@dotnet-comment-bot
Copy link
Collaborator

‼️ Found issues ‼️

Project Coverage Type Expected Actual
Microsoft.Gen.MetadataExtractor Line 98 57.35 🔻
Microsoft.Gen.MetadataExtractor Branch 98 62.5 🔻
Microsoft.Extensions.AI.Ollama Line 80 78.25 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Extensions.AI.Abstractions 83 84
Microsoft.Extensions.AI 88 89
Microsoft.Extensions.AI.OpenAI 77 78

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=935225&view=codecoverage-tab

@dotnet-comment-bot
Copy link
Collaborator

‼️ Found issues ‼️

Project Coverage Type Expected Actual
Microsoft.Extensions.Caching.Hybrid Line 86 82 🔻
Microsoft.Extensions.AI.Ollama Line 80 78.25 🔻
Microsoft.Gen.MetadataExtractor Line 98 57.35 🔻
Microsoft.Gen.MetadataExtractor Branch 98 62.5 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Extensions.AI.OpenAI 77 78
Microsoft.Extensions.AI 88 89
Microsoft.Extensions.AI.Abstractions 83 84

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=938365&view=codecoverage-tab

@dotnet-comment-bot
Copy link
Collaborator

‼️ Found issues ‼️

Project Coverage Type Expected Actual
Microsoft.Extensions.AI.Ollama Line 80 78.25 🔻
Microsoft.Gen.MetadataExtractor Line 98 57.35 🔻
Microsoft.Gen.MetadataExtractor Branch 98 62.5 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Extensions.AI.Abstractions 83 84
Microsoft.Extensions.Caching.Hybrid 86 87
Microsoft.Extensions.AI.OpenAI 77 78
Microsoft.Extensions.AI 88 89

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=938418&view=codecoverage-tab

Copy link

@DeagleGross DeagleGross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved; but would be happy if someone else takes a look (PR is long living). Below asked a couple of specific code questions; overall purpose of PR is kind of understandable, but I dont have any background with HybridCache

@@ -58,7 +62,7 @@ public byte[] ToArray()
}

var copy = new byte[length];
Buffer.BlockCopy(Array!, 0, copy, 0, length);
Buffer.BlockCopy(OversizedArray!, Offset, copy, 0, length);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it is important, but Span.CopyTo is faster than Buffer.BlockCopy. See proof

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice; I will integrate that next cycle

var length = HybridCachePayload.Write(oversized, key, cacheItem.CreationTimestamp, GetL2AbsoluteExpirationRelativeToNow(options),
HybridCachePayload.PayloadFlags.None, cacheItem.Tags, payload.AsSequence());

await SetDirectL2Async(key, new(oversized, 0, length, true), GetL2DistributedCacheOptions(options), token).ConfigureAwait(false);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is ConfigureAwait(false) needed for lower FXs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for environments that have a sync-context, which usually means lower FXs; however, as library code: we don't know about the sync-context state, so: best to be explicit

I believe the plan to allow a global setting for this is still being discussed; I will check whether that finally shipped!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this were "aspnetcore code, only targeting netcurrent", then yes: we could probably get away without specifying this; however, HC a: can be used in non-aspnetcore contexts (WinForms, etc), and b: supports netstandard, netfx, etc


byte[] oversized = ArrayPool<byte>.Shared.Rent(sizeof(long));
BinaryPrimitives.WriteInt64LittleEndian(oversized, timestamp);
var pending = SetDirectL2Async(TagKeyPrefix + tag, new BufferChunk(oversized, 0, sizeof(long), false), _tagInvalidationEntryOptions, token);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any overhead calling await SetDirectL2Async right here? If it is an already finished, isn't await doing nothing more than just getting a result?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i guess the async-state-machine will still be generated, yep?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is to avoid the state machine overhead in the happy path; that isn't terribad, but: it is sometimes desirable to hand-crank it

@@ -20,8 +21,11 @@ namespace Microsoft.Extensions.Caching.Hybrid.Internal;
/// <summary>
/// The inbuilt implementation of <see cref="HybridCache"/>, as registered via <see cref="HybridCacheServiceExtensions.AddHybridCache(IServiceCollection)"/>.
/// </summary>
[SkipLocalsInit]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why SkipLocalsInit exists on specific types only? Not on module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with either, but I don't really need it globally; least surprises, etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants