-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Physical rewrite #17565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Physical rewrite #17565
Conversation
|
@amotin, thank you for this! on a first pass it looks good to me. |
During regular block writes ZFS sets both logical and physical birth times equal to the current TXG. During dedup and block cloning logical birth time is still set to the current TXG, but physical may be copied from the original block that was used. This represents the fact that logically user data has changed, but the physically it is the same old block. But block rewrite introduces a new situation, when block is not changed logically, but stored in a different place of the pool. From ARC, scrub and some other perspectives this is a new block, but for example for user applications or incremental replication it is not. Somewhat similar thing happen during remap phase of device removal, but in that case space blocks are still acounted as allocated at their logical birth times. This patch introduces a new "rewrite" flag in the block pointer structure, allowing to differentiate physical rewrite (when the block is actually reallocated at the physical birth time) from the device reval case (when the logical birth time is used). The new functionality is not used at this point, and the only expected change is that error log is now kept in terms of physical physical birth times, rather than logical, since if a block with logged error was somehow rewritten, then the previous error does not matter any more. This change also introduces a new TRAVERSE_LOGICAL flag to the traverse code, allowing zfs send, redact and diff to work in context of logical birth times, ignoring physical-only rewrites. It also changes nothing at this point due to lack of those writes, but they will come in a following patch. Signed-off-by: Alexander Motin <[email protected]>
3b724aa to
c9382ac
Compare
|
Just a rebase and conflict resolution. |
tests/zfs-tests/tests/functional/cli_root/zfs_rewrite/zfs_rewrite_physical.ksh
Outdated
Show resolved
Hide resolved
Based on previous commit this implements `zfs rewrite -P` flag, making ZFS to keep blocks logical birth times while rewriting files. It should exclude the rewritten blocks from incremental sends, snapshot diffs, etc. Snapshots space usage same time will reflect the additional space usage from newly allocated blocks. Since this begins to use new "rewrite" flag in the block pointers, this commit introduces a new read-compatible per-dataset feature physical_rewrite. It must be enabled for the command to not fail, it is activated on first use and deactivated on deletion of the last affected dataset. Signed-off-by: Alexander Motin <[email protected]>
robn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks pretty straightforward to me, good job. That one comment describing the macros made the whole thing very understandable.
|
Question: it seems like the only time you wouldn't want the physical rewrite option is if you don't want the pool feature? If true, could we make it the default somehow? I know its tricky if you don't have the feature enabled, and I don't think |
|
@robn As I mentioned, somebody was actually happy to use logical rewrite to force replication, but that is a minor case. The requirement of the feature was indeed the main factor for me though. It was not feeling right to me also to change the behavior based on feature status. Plus logical rewrite was implemented first, and we've already merged it to our branch of ZFS 2.3 in upcoming TrueNAS 25.10, so it would be a change in behavior, while we can't merge the physical rewrite there too due to the feature. I don't object it being default, but I am open to hear how. |
|
Are either of the rewrite options mature enough to consider for 2.3.4? |
|
@stuartthebruce The logical rewrite is quite simple, that's why we've merged them into our TrueNAS ZFS sources. I could include it into #17595, if people desire. Physical rewrite though extends the on-disk format and so require a new pool feature, that's why it will not go below 3.4. |
+1 |
As a user, I would like to have it, but is there a precedent for extending on-disk format in a patch release? It is not so easy to deduce this from the changelog. |
|
I believe getting logical rewrite out sooner rather than later is a good thing, especially if it keeps the different OpenZFS Forks mostly feature aligned. It's just strictly superior to the way things have to be done without it. On disk format changes obviously are a different thing so they can wait, but it's not like running a zsh script checking mtime for consistency is a robust solution. |
Based on previous commit this implements `zfs rewrite -P` flag, making ZFS to keep blocks logical birth times while rewriting files. It should exclude the rewritten blocks from incremental sends, snapshot diffs, etc. Snapshots space usage same time will reflect the additional space usage from newly allocated blocks. Since this begins to use new "rewrite" flag in the block pointers, this commit introduces a new read-compatible per-dataset feature physical_rewrite. It must be enabled for the command to not fail, it is activated on first use and deactivated on deletion of the last affected dataset. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes #17565
Not that I can recall, at least since 2.0. It would be a bad practice to have patch releases incompatible. It should be a decision of distributions, and through them users, when to switch to the next stable branch, accepting risks of possible incompatibilities. Patch release updates should remain a safe routine. |
Physical rewrite patch changed the meaning of BP_GET_BIRTH(), but I missed update one of its occurences, ending up asserting equal logical birth times instead of equal physical birth times. Signed-off-by: Alexander Motin <[email protected]> Fixes openzfs#17565
Physical rewrite patch changed the meaning of BP_GET_BIRTH(), but I missed update one of its occurences, ending up asserting equal logical birth times instead of equal physical birth times. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Fixes #17565 Closes #17631
During regular block writes ZFS sets both logical and physical birth times equal to the current TXG. During dedup and block cloning logical birth time is still set to the current TXG, but physical may be copied from the original block that was used. This represents the fact that logically user data has changed, but the physically it is the same old block. But block rewrite introduces a new situation, when block is not changed logically, but stored in a different place of the pool. From ARC, scrub and some other perspectives this is a new block, but for example for user applications or incremental replication it is not. Somewhat similar thing happen during remap phase of device removal, but in that case space blocks are still acounted as allocated at their logical birth times. This patch introduces a new "rewrite" flag in the block pointer structure, allowing to differentiate physical rewrite (when the block is actually reallocated at the physical birth time) from the device reval case (when the logical birth time is used). The new functionality is not used at this point, and the only expected change is that error log is now kept in terms of physical physical birth times, rather than logical, since if a block with logged error was somehow rewritten, then the previous error does not matter any more. This change also introduces a new TRAVERSE_LOGICAL flag to the traverse code, allowing zfs send, redact and diff to work in context of logical birth times, ignoring physical-only rewrites. It also changes nothing at this point due to lack of those writes, but they will come in a following patch. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes openzfs#17565
Based on previous commit this implements `zfs rewrite -P` flag, making ZFS to keep blocks logical birth times while rewriting files. It should exclude the rewritten blocks from incremental sends, snapshot diffs, etc. Snapshots space usage same time will reflect the additional space usage from newly allocated blocks. Since this begins to use new "rewrite" flag in the block pointers, this commit introduces a new read-compatible per-dataset feature physical_rewrite. It must be enabled for the command to not fail, it is activated on first use and deactivated on deletion of the last affected dataset. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Closes openzfs#17565
Physical rewrite patch changed the meaning of BP_GET_BIRTH(), but I missed update one of its occurences, ending up asserting equal logical birth times instead of equal physical birth times. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Fixes openzfs#17565 Closes openzfs#17631
Motivation and Context
The earlier implemented
zfs rewritefunctionality for simplicity updated logical birth times of all rewritten blocks. It makes them look modified from perspective of replication, snapshot diffs, etc, even though the actual user data remain the same. While some people found it useful to recover corrupted remote backups, for majority replication of large extra amounts of logically unchanged blocks can be a huge waste of time and resources.Description
This PR implements a new variation of rewrite, called "physical rewrite", controlled by the new
-Pargument to thezfs rewritesubcommand. When possible, it tries to keep logical birth times unchanged. It allows to distinguish blocks that were just relocated within a pool from blocks that were actually modified by users. While the first may occupy additional disk space due to snapshots, block cloning, etc, that should be accounted as such, they should be ignored by replication, etc.Previously we've had block pointers with physical birth times bigger than logical birth times only as result of device removal remap process. But in that case space usage accounting was still based on block's logical birth times. Since physical rewrites require space reallocation accounted based on the physical birth times, to differentiate those two cases this PR introduces new "R"/"rewrite" flag in the block pointer structure. When set, it means the block's space accounting should use physical birth time instead of traditional logical birth time. Since read-only pool imports do not really care about space accounting, the new per-dataset pool feature "physical_rewrite" gating this is declared as read-compatible. The feature will be activated on first use and deactivated when last of affected datasets is deleted.
There are two exceptions when logical birth time might still be modified around physical rewrite:
Now that we have different birth times in block pointers, traversal code got new
TRAVERSE_LOGICALflag, allowing to choose between traversing only logical changes (replication, diff, etc), or physical changes (scrub/resilver, dataset destroy, etc).How Has This Been Tested?
Several successful CI runs. Manual testing with
zfs rewriteandzfs rewrite -Pvszfs send -i.Types of changes
Checklist:
Signed-off-by.