Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate eviction process; use same connection #1495

Merged
merged 2 commits into from
Dec 20, 2023

Conversation

JakeSiFive
Copy link
Contributor

This change eliminates the eviction process which is now quite apparently an over optimization IMO. With this new change there will be small pauses in wake here and there while the eviction tasks are occurring but on the plus side we now have a single DB connection, guarded by a file system lock, per cache.

Additionally I spotted a couple cases where I was improperly using the database on two separate threads which is a big no no for SQLite and may have been causing some of our issues. I've removed those at additional cost (but it should be a small cost).

With these changes things should be both easier to debug, and possibly more stable.

And remember kids, friends don't let friends write multi-threaded C++ code when Rust is an option ;)

Copy link
Contributor

@colinschmidt colinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me.
Did we see how long the pauses are when cleaning a large cache? Is it in seconds, ms, us?

@JakeSiFive
Copy link
Contributor Author

JakeSiFive commented Dec 20, 2023

Did we see how long the pauses are when cleaning a large cache? Is it in seconds, ms, us?

I didn't check but it should be in the low ms with TTL. For TTL it does a query over the jobs to find everything that has expired but there's an index on that time so it should be fast. It then goes and unlinks just the inodes associated with those jobs. The P95 might be hundreds of milliseconds and the P99 might be several seconds (try rm -rf ing a cache and see how long it takes!) The current LRU implementation which is not tested or confirmed to have been made to work by this change would be longer more abrupt pauses.

These pauses can be mostly eliminated by having a queue for things to free that's processed by a separate thread but I've learned my lesson about premature optimization on this project so I'll only do that if its required 😄

Copy link
Contributor

@colinschmidt colinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming the cache doesn't frequently get stuck for long periods during evictions. Where frequently and long is something like 99% are less than 10 seconds.

@JakeSiFive
Copy link
Contributor Author

I'll measure to confirm but I'll go ahead and land this for testing purposes

@JakeSiFive JakeSiFive merged commit 33032fe into master Dec 20, 2023
12 checks passed
@JakeSiFive JakeSiFive deleted the move_eviction_into_same_process branch December 20, 2023 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants