Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

files cache with size #7846

Merged
merged 26 commits into from
Jul 18, 2024

Conversation

ThomasWaldmann
Copy link
Member

No description provided.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Sep 25, 2023

This is trying to prepare a bigger change in how borg works: giving up with precise refcounting and removing chunk size information from the chunks index/cache (so borg could just work with the set of existing chunks with no further information about them in the chunks index/cache).

A first step into that direction is adding the chunk sizes to the files cache (it currently only has the chunk ids).

If borg create detects a file is unchanged, we need to have that information at hand to create the item's chunk list without reading/chunking the file.

AdHocWithFilesCache tries to be similar to AdHocCache, but additionally has the files cache, so it is super fast for unchanged files. AdHocCache is DEPRECATED now.

@ThomasWaldmann ThomasWaldmann force-pushed the files-cache-with-size branch 3 times, most recently from 527b612 to aef7d7a Compare September 25, 2023 21:40
@codecov-commenter
Copy link

codecov-commenter commented Sep 25, 2023

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 80.97686% with 74 lines in your changes missing coverage. Please review.

Project coverage is 81.61%. Comparing base (126a346) to head (619a06a).
Report is 3 commits behind head on master.

Files Patch % Lines
src/borg/cache.py 80.69% 51 Missing and 16 partials ⚠️
src/borg/archive.py 83.87% 4 Missing and 1 partial ⚠️
src/borg/archiver/config_cmd.py 0.00% 1 Missing ⚠️
src/borg/archiver/rinfo_cmd.py 66.66% 0 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7846      +/-   ##
==========================================
- Coverage   83.76%   81.61%   -2.16%     
==========================================
  Files          67       67              
  Lines       12074    12156      +82     
  Branches     2193     2192       -1     
==========================================
- Hits        10114     9921     -193     
- Misses       1367     1649     +282     
+ Partials      593      586       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ThomasWaldmann ThomasWaldmann force-pushed the files-cache-with-size branch 5 times, most recently from cdf705b to f404810 Compare September 30, 2023 20:14
@ThomasWaldmann ThomasWaldmann force-pushed the files-cache-with-size branch 2 times, most recently from dc25787 to 1389bd1 Compare April 4, 2024 11:33
@ThomasWaldmann ThomasWaldmann force-pushed the files-cache-with-size branch 2 times, most recently from 4fd25be to 2b27872 Compare June 1, 2024 15:29
@ThomasWaldmann ThomasWaldmann marked this pull request as ready for review July 12, 2024 22:00
@ThomasWaldmann ThomasWaldmann force-pushed the files-cache-with-size branch 2 times, most recently from 2f78282 to cce4fcc Compare July 14, 2024 15:39
the files cache used to have only the chunk ids,
so it had to rely on the chunks index having the
size information - which is problematic with e.g.
the AdhocCache (has size==0 for all not new chunks) and blocked using the files cache there.
incref: returns (id, size), so it needs the size if it can't
get it from the chunks index. also needed for updating stats.

decref: caller does not always have the chunk size (e.g. for
metadata chunks),
as we consider 0 to be an invalid size, we call with size == 1
in that case. thus, stats might be slightly off.
when given, force using the AdHocCache.
if a chunk is missing in repo, it will also be missing in a ad-hoc
built chunks index.
thus:

- no cache.path
- skip on-disk cache corruption tests for AdHocCache
Also:
- move common code to ChunksMixin
- always use ._txn_active (not .txn_active)

Some tests are still failing.
if we use AdHocCache or NewCache, we do not have precise refcounting.
thus, we do not delete repo objects as their refcount does not go to zero.

check --repair will just remove the orphans.
Only LocalCache implements these.
NewCache does not do precise refcounting, thus chunks won't be deleted
from the repo at "borg delete" time.

"borg check --repair" would remove such chunks IF they are orphans.
NewCache does not do precise refcounting, thus it does not know about
unique chunks.

For now, let's just test num_chunks, but not unique_chunks.
NewCache and AdHocCache do not have a persistent chunks index,
so both check_cache and test_check_cache are pointless.
We can not use unique_chunks counter with NewCache,
thus we use a simpler (and weaker) assertion.
removed some code borg had for backwards compatibility with
old borg versions (that had previous_location only in the
cache).

now the repo location is only checked against the location
file in the security dir, simplifying the code and also
fixing a related test failure with NewCache.

also improved test_repository_move to test for aborting in
case the repo location changed unexpectedly.
removed some code borg had for backwards compatibility with
old borg versions (that had key_type only in the cache).

now the repo key_type is only checked against the key-type
file in the security dir, simplifying the code.
removed some code borg had for backwards compatibility with
old borg versions (that had timestamp only in the cache).

now the manifest timestamp is only checked against the manifest-timestamp
file in the security dir, simplifying the code.
Add new borg create option '--prefer-adhoc-cache' to prefer the
AdHocCache over the NewCache implementation.

Adjust a test to match the previous default behaviour (== use the
AdHocCache) with --no-cache-sync.
- skip test_cache_chunks if there is no persistent chunks cache file
- init self.chunks for AdHocCache
- remove warning output from AdHocCache.__init__, it gets mixed with JSON output and fails the JSON decoder.
BORG_CACHE_IMPL allows users to choose the client-side cache implementation from 'local', 'newcache' and 'adhoc'.
Also: support a "cli" env var value, that does not determine
the implementation from the env var, but rather from cli options (similar to as it was before adding BORG_CACHE_IMPL).
@ThomasWaldmann ThomasWaldmann merged commit 66b62c6 into borgbackup:master Jul 18, 2024
13 checks passed
@ThomasWaldmann ThomasWaldmann deleted the files-cache-with-size branch July 18, 2024 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants