-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add util for finding trie root hash #6094
Conversation
ledger/complete/wal/encoding.go
Outdated
@@ -61,6 +61,7 @@ func Decode(data []byte) (operation WALOperation, rootHash ledger.RootHash, upda | |||
switch operation { | |||
case WALUpdate: | |||
update, err = ledger.DecodeTrieUpdate(data[1:]) | |||
rootHash = update.RootHash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes a bug, that the root hash is not returned when decoding a record in the wal file.
flagOutputDir string | ||
) | ||
|
||
var Cmd = &cobra.Command{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some high level docs about what this command does and how it should be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments, and also documented some detailed use case here
ledger/complete/wal/encoding_test.go
Outdated
@@ -44,7 +43,7 @@ func TestUpdate(t *testing.T) { | |||
operation, stateCommitment, up, err := realWAL.Decode(data) | |||
require.NoError(t, err) | |||
assert.Equal(t, realWAL.WALUpdate, operation) | |||
assert.Equal(t, stateCommitment, ledger.RootHash(hash.DummyHash)) | |||
assert.Equal(t, stateCommitment, ledger.RootHash(rootHash)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @fxamacker , could you review my change here?
I'm not sure why the test case originally expected the decoded root hash to be zero. Would like to double check in case I missed anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangchiqing I think currently wal.Decode()
returns un-initialized rootHash
(empty rootHash
) if operation is WALUpdate
.
rootHash
is only populated in wal.Decode()
for WALDelete
operation.
The existing test expecting empty rootHash fails because you changed wal.Decode()
in this PR. This can be a breaking change in other use cases as well.
Given this, instead of modifying wal.Decode()
function, you can use returned update.RootHash
from wal.Decode()
for your use case in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see why it was decided this way.
There are two types of operation: WALUpdate and WALDelete.
WALDelete only has the root hash. WALUpdate can be decoded into update, which contains the root hash.
So the operation is WALDelete, than the root hash is returned, and the update is nil; if the operation is WALUpdate, then the decoded update will be returned. So the root hash is only for the case that the operation is WALDelete.
But I still think in the case the operation is WALUpdate, it should return the root hash. There isn't any code reading the root hash when the operation is WALUpdate. So fixing it would not cause any problem.
This fix would also make my logic simplier, because I don't need to check the returned update
object, which might be nil
if the operation is WALDelete. I could just check the root hash and the operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took your suggestion and reverted my changes, and also added some commends on the Decode
method to explain the behavior.
from, to, err := prometheusWAL.Segments(dir) | ||
if err != nil { | ||
return 0, 0, fmt.Errorf("cannot get segments: %w", err) | ||
} |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
cmd/util/cmd/find-trie-root/cmd.go
Outdated
w, err := prometheusWAL.NewSize(lg, prometheus.DefaultRegisterer, dir, wal.SegmentSize, false) | ||
if err != nil { | ||
return 0, 0, fmt.Errorf("cannot create WAL: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to create w
here? We can use dir
to replace w.Dir()
as parameter NewSegmentsRangeReader()
.
ledger/complete/wal/encoding_test.go
Outdated
@@ -44,7 +43,7 @@ func TestUpdate(t *testing.T) { | |||
operation, stateCommitment, up, err := realWAL.Decode(data) | |||
require.NoError(t, err) | |||
assert.Equal(t, realWAL.WALUpdate, operation) | |||
assert.Equal(t, stateCommitment, ledger.RootHash(hash.DummyHash)) | |||
assert.Equal(t, stateCommitment, ledger.RootHash(rootHash)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangchiqing I think currently wal.Decode()
returns un-initialized rootHash
(empty rootHash
) if operation is WALUpdate
.
rootHash
is only populated in wal.Decode()
for WALDelete
operation.
The existing test expecting empty rootHash fails because you changed wal.Decode()
in this PR. This can be a breaking change in other use cases as well.
Given this, instead of modifying wal.Decode()
function, you can use returned update.RootHash
from wal.Decode()
for your use case in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some more comments. Maybe add some docs for functions like copyWAL()
to describe high level purpose of functions.
cmd/util/cmd/find-trie-root/cmd.go
Outdated
return 0, 0, fmt.Errorf("finish reading all segment files from %d to %d, but not found", from, to) | ||
} | ||
|
||
func copyWAL(dir, outputDir string, segment int, expectedRoot ledger.RootHash) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To help me understand:
-
Is
copyWAL()
intended to create a new segment file with all records up to and including first update record with specifiedrootHash
? -
If
outputDir
has existing wal files, the new segment file would be created with higher index as filename suffix. Is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is copyWAL() intended to create a new segment file with all records up to and including first update record with specified rootHash?
Yes
If outputDir has existing wal files, the new segment file would be created with higher index as filename suffix. Is this expected?
yeah, good point. the outputDir is supposed to be different from the dir, (I can add a check to prevent that), so the generated wal is always 00000000
in a separate output folder.
cmd/util/cmd/find-trie-root/cmd.go
Outdated
bytes := wal.EncodeUpdate(update) | ||
_, err = writer.Log(bytes) | ||
if err != nil { | ||
return fmt.Errorf("cannot write LedgerWAL record: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update
can be nil
for delete operation record. Maybe check operation before encoding record.
d840dcf
to
fa6f02d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for updating. I focused on reviewing code related to wal
files.
// Decode decodes the given data into a WAL operation, root hash and trie update. | ||
// It returns (WALDelete, rootHash, nil, nil) if the operation is WALDelete. | ||
// It returns (WALUpdate, hash.DummyHash, update, nil) if the operation is WALUpdate. | ||
// To read the root hash of the trie update, use update.RootHash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
cmd/util/cmd/find-trie-root/cmd.go
Outdated
w, err := prometheusWAL.NewSize(log.Logger, nil, dir, wal.SegmentSize, false) | ||
if err != nil { | ||
return fmt.Errorf("cannot create WAL: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems w
isn't used except for w.Dir()
, so we can remove w
and use dir
directly.
w, err := prometheusWAL.NewSize(log.Logger, nil, dir, wal.SegmentSize, false) | |
if err != nil { | |
return fmt.Errorf("cannot create WAL: %w", err) | |
} |
I added some logic to replace the WAL file once it found the matching wal file with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for automating this aspect as suggested! 👍 I left some more comments and suggestions.
cmd/util/cmd/find-trie-root/cmd.go
Outdated
// genereate a segment file to the temporary folder with the root hash as its last record | ||
newSegmentFile, err := findRootHashAndCreateTrimmed(flagExecutionStateDir, segment, rootHash) | ||
if err != nil { | ||
log.Fatal().Err(err).Msg("cannot copy WAL") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can create temp directory in this function and pass it to findRootHashAndCreateTrimmed()
so that temp directory can be removed in defer
.
backupFile := prometheusWAL.SegmentName(backupDir, i) | ||
|
||
log.Info().Msgf("backup segment file %s to %s, %v/%v", segmentFile, backupFile, i, last) | ||
err := os.Rename(segmentFile, backupFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to use copyFile()
here in case backup directory is in a different device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about this. If I use copy file, it will take some time, I think we most likely would move within the same device. If a different device is used, then it fails, they can retry again with a different path on the same device.
I could also check if they are different device, and then use copy and remove the old files, but that would be too much code. I decided to keep it simple.
cmd/util/cmd/find-trie-root/cmd.go
Outdated
return nil | ||
} | ||
|
||
func copyFile(src, dst string) (int64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of copying file, do we want to move file instead?
cmd/util/cmd/find-trie-root/cmd.go
Outdated
if os.IsNotExist(err) { | ||
return true, nil | ||
} | ||
return false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return error here?
return false, nil | |
return false, err |
f9e9e0b
to
3bf99df
Compare
This PR adds a util command to find trie root hash in the wal file. It allows us to roll back executed height to any height.
Context: the current util allows us to roll back executed height to any height, which remove the execution results in the database. However, the EN is not able to start up, because the checkpoint and wal file has not been rollback to the corresponding height.
Rolling back the checkpoint and wal has been quite difficult, we've made tool to find the block for the block, however, it only allow us to roll back executed height to the block of the checkpoint. In order to be able to roll back to any height, we need to find the trie root hash in the wal files, and this is what this find trie root hash util is made for.