-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NonderterministicFastCommit to speed up migrations when ordering isn't required #403
Add NonderterministicFastCommit to speed up migrations when ordering isn't required #403
Conversation
NonderterministicFastCommit commits changes in nondeterministic order. It can be used by migration program when ordering isn't required. │ before.txt │ after.txt │ │ sec/op │ sec/op vs base │ StorageFastCommit/10-12 89.72µ ± 4% 57.50µ ± 3% -35.92% (p=0.000 n=10) StorageFastCommit/100-12 118.9µ ± 1% 116.0µ ± 4% ~ (p=0.436 n=10) StorageFastCommit/1000-12 4.086m ± 5% 2.397m ± 25% -41.35% (p=0.000 n=10) StorageFastCommit/10000-12 12.629m ± 4% 9.857m ± 3% -21.95% (p=0.000 n=10) StorageFastCommit/100000-12 102.73m ± 0% 72.26m ± 1% -29.66% (p=0.000 n=10) StorageFastCommit/1000000-12 1.544 ± 2% 1.141 ± 2% -26.09% (p=0.000 n=10) geomean 6.661m 4.848m -27.21% │ before.txt │ after.txt │ │ B/op │ B/op vs base │ StorageFastCommit/10-12 28.92Ki ± 0% 28.05Ki ± 0% -3.00% (p=0.000 n=10) StorageFastCommit/100-12 286.4Ki ± 0% 278.6Ki ± 0% -2.71% (p=0.000 n=10) StorageFastCommit/1000-12 3.009Mi ± 0% 2.901Mi ± 0% -3.58% (p=0.000 n=10) StorageFastCommit/10000-12 28.65Mi ± 0% 27.79Mi ± 0% -2.98% (p=0.000 n=10) StorageFastCommit/100000-12 278.8Mi ± 0% 271.1Mi ± 0% -2.75% (p=0.000 n=10) StorageFastCommit/1000000-12 2.923Gi ± 0% 2.821Gi ± 0% -3.49% (p=0.000 n=10) geomean 9.101Mi 8.820Mi -3.09% │ before.txt │ after.txt │ │ allocs/op │ allocs/op vs base │ StorageFastCommit/10-12 219.0 ± 0% 205.0 ± 0% -6.39% (p=0.000 n=10) StorageFastCommit/100-12 1.980k ± 0% 1.875k ± 0% -5.30% (p=0.000 n=10) StorageFastCommit/1000-12 19.23k ± 0% 18.23k ± 0% -5.22% (p=0.000 n=10) StorageFastCommit/10000-12 191.1k ± 0% 181.1k ± 0% -5.24% (p=0.000 n=10) StorageFastCommit/100000-12 1.918M ± 0% 1.816M ± 0% -5.30% (p=0.000 n=10) StorageFastCommit/1000000-12 19.15M ± 0% 18.15M ± 0% -5.22% (p=0.000 n=10) geomean 62.31k 58.91k -5.45%
cfb364c
to
7162eab
Compare
Looks like validation and storage health checks passed tonight with this PR added to atree migration program at:
Need to take another look at logs to confirm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea and great work!
storage.go
Outdated
@@ -964,6 +969,204 @@ func (s *PersistentSlabStorage) FastCommit(numWorkers int) error { | |||
return nil | |||
} | |||
|
|||
// NonderterministicFastCommit commits changes in nondeterministic order. | |||
// This is used by migration program when ordering isn't required. | |||
func (s *PersistentSlabStorage) NonderterministicFastCommit(numWorkers int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to get a diff of this new function and the existing deterministic version of it? It would make it easier to review, by seeing how they differ.
If the functions only differ in a small way, maybe it's possible to extract the shared code and only make the parts that differ configurable?
@turbolent Good suggestions! However, in this case there are more changes than unchanged code so a diff, etc. isn't as helpful as it normally would be. The changes are mostly about eliminating overhead and parallelizing as much as possible. I outlined steps in both functions below so we can see the differences at higher level.
In order to commit in deterministic order, the commit step has to wait until all slabs are encoded, and there are also overheads from sorting slab IDs and collecting encoded results in temporary map.
In nondeterministic approach, the commit steps are in parallel with encoding and there isn't any temp map to collect data. Even though goroutine encoding step is similar, |
@fxamacker Thank you for the description! 👌 I'll take another look today |
storage.go
Outdated
deletedSlabCount := 0 | ||
for k, v := range s.deltas { | ||
// Ignore slabs not owned by accounts | ||
if k.address == AddressUndefined { | ||
continue | ||
} | ||
if v == nil { | ||
deletedSlabCount++ | ||
} else { | ||
modifiedSlabCount++ | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a raw idea, anyway we can capture this data as part of updates, so we can skip iteration here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a raw idea, anyway we can capture this data as part of updates, so we can skip iteration here.
@ramtinms I like the idea of skipping iterations if possible!
By "updates", do you mean in Store()
when modified or deleted slabs are stored in deltas
? If so, it may be less memory efficient for non-migration use cases.
To reduce iterations here in NonderterministicFastCommit()
, I pushed commit a2a4f5c to iterate only once to get modified and deleted slabs to process later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, looks good to me, thanks for adding the tests to compare encoded final results.
This change reduces number of lines in the function but is not expected to yield significant speed improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
I think I understand what the new function does, but it's quite hard to follow given the many channels that are involved. Maybe https://pkg.go.dev/golang.org/x/sync/errgroup can help here?
@@ -64,12 +32,12 @@ jobs: | |||
contents: read | |||
steps: | |||
- name: Checkout source | |||
uses: actions/checkout@v3 | |||
uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4.1.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these dependency updates necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These updates to pin dependencies improves OpenSSF score. However, I reverted the commit that updated golangci-lint to 1.53.3 (out of scope of this PR).
Example when all dependencies are pinned (score 0-10 with 10 being perfect):
For more, replace "example/repo" in this URL to see complete report (not all repos are scored yet).
4e57b9d
to
88fa22f
Compare
@fxamacker Does this also have to get ported to the 1.0 branch? |
@turbolent I will port it for completeness, but it is only needed if we use the optimization for Cadence 1.0 migration without atree inlining. |
Updates #394
This PR adds
NonderterministicFastCommit
to improve speed, bytes/op, and allocs/op compared toFastCommit
.NonderterministicFastCommit
commits changed slabs in nondeterministic order. It can be used by migration programs that don't require commit sequence of slabs to be deterministic while still preserving deterministic encoding of slab data (e.g. iteration of arrays and maps remain deterministic).FastCommit vs NonderterministicFastCommit
Casual micro benchstat output (from my busy desktop):
main
branchFiles changed
in the Github PR explorer