Add BatchPreload to decode slabs in parallel and cache #404

fxamacker · 2024-05-09T14:16:27Z

Updates #394

This PR adds BatchPreload which decodes slabs in parallel and stores decoded slabs in cache for later retrieval.

Migration benchmark (not available yet) will use https://github.com/onflow/flow-go/tree/feature/atree-inlining-cadence-v0.42

Casual microbenchmark (on my busy desktop):

                           │  before.txt  │              after.txt               │
                           │    sec/op    │    sec/op     vs base                │
StorageRetrieve/10-12         36.23µ ± 3%   35.33µ ±  4%        ~ (p=0.075 n=10)
StorageRetrieve/100-12        469.6µ ± 8%   124.3µ ±  0%  -73.52% (p=0.000 n=10)
StorageRetrieve/1000-12       6.678m ± 7%   2.303m ± 20%  -65.51% (p=0.000 n=10)
StorageRetrieve/10000-12      29.81m ± 2%   12.26m ±  5%  -58.86% (p=0.000 n=10)
StorageRetrieve/100000-12    303.33m ± 1%   88.40m ±  1%  -70.86% (p=0.000 n=10)
StorageRetrieve/1000000-12     3.442 ± 1%    1.137 ±  3%  -66.96% (p=0.000 n=10)
geomean                       12.34m        4.816m        -60.98%

                           │  before.txt  │              after.txt              │
                           │     B/op     │     B/op      vs base               │
StorageRetrieve/10-12        21.59Ki ± 0%   21.59Ki ± 0%       ~ (p=1.000 n=10)
StorageRetrieve/100-12       219.8Ki ± 0%   224.7Ki ± 0%  +2.24% (p=0.000 n=10)
StorageRetrieve/1000-12      2.266Mi ± 0%   2.272Mi ± 0%  +0.27% (p=0.000 n=10)
StorageRetrieve/10000-12     21.94Mi ± 0%   22.14Mi ± 0%  +0.91% (p=0.000 n=10)
StorageRetrieve/100000-12    215.3Mi ± 0%   218.5Mi ± 0%  +1.50% (p=0.000 n=10)
StorageRetrieve/1000000-12   2.211Gi ± 0%   2.212Gi ± 0%  +0.05% (p=0.000 n=10)
geomean                      6.919Mi        6.976Mi       +0.82%

                           │ before.txt  │              after.txt               │
                           │  allocs/op  │  allocs/op   vs base                 │
StorageRetrieve/10-12         76.00 ± 0%    76.00 ± 0%       ~ (p=1.000 n=10) ¹
StorageRetrieve/100-12        745.0 ± 0%    759.0 ± 0%  +1.88% (p=0.000 n=10)
StorageRetrieve/1000-12      7.161k ± 0%   7.153k ± 0%  -0.11% (p=0.000 n=10)
StorageRetrieve/10000-12     70.77k ± 0%   70.58k ± 0%  -0.27% (p=0.000 n=10)
StorageRetrieve/100000-12    711.9k ± 0%   709.7k ± 0%  -0.31% (p=0.000 n=10)
StorageRetrieve/1000000-12   7.115M ± 0%   7.077M ± 0%  -0.54% (p=0.000 n=10)
geomean                      22.93k        22.95k       +0.11%

Targeted PR against main branch
Linked to Github issue with discussion and accepted design OR link to spec that describes this work
Code follows the standards mentioned here
Updated relevant documentation
Re-reviewed Files changed in the Github PR explorer
Added appropriate labels

The intended use for BatchPreload is to speedup migrations. BatchPreload decodes slabs in parallel and stores decoded slabs in cache for later retrieval. This is useful for migration program when most or all slabs are expected to be migrated. │ before.txt │ after.txt │ │ sec/op │ sec/op vs base │ StorageRetrieve/10-12 36.23µ ± 3% 35.33µ ± 4% ~ (p=0.075 n=10) StorageRetrieve/100-12 469.6µ ± 8% 124.3µ ± 0% -73.52% (p=0.000 n=10) StorageRetrieve/1000-12 6.678m ± 7% 2.303m ± 20% -65.51% (p=0.000 n=10) StorageRetrieve/10000-12 29.81m ± 2% 12.26m ± 5% -58.86% (p=0.000 n=10) StorageRetrieve/100000-12 303.33m ± 1% 88.40m ± 1% -70.86% (p=0.000 n=10) StorageRetrieve/1000000-12 3.442 ± 1% 1.137 ± 3% -66.96% (p=0.000 n=10) geomean 12.34m 4.816m -60.98% │ before.txt │ after.txt │ │ B/op │ B/op vs base │ StorageRetrieve/10-12 21.59Ki ± 0% 21.59Ki ± 0% ~ (p=1.000 n=10) StorageRetrieve/100-12 219.8Ki ± 0% 224.7Ki ± 0% +2.24% (p=0.000 n=10) StorageRetrieve/1000-12 2.266Mi ± 0% 2.272Mi ± 0% +0.27% (p=0.000 n=10) StorageRetrieve/10000-12 21.94Mi ± 0% 22.14Mi ± 0% +0.91% (p=0.000 n=10) StorageRetrieve/100000-12 215.3Mi ± 0% 218.5Mi ± 0% +1.50% (p=0.000 n=10) StorageRetrieve/1000000-12 2.211Gi ± 0% 2.212Gi ± 0% +0.05% (p=0.000 n=10) geomean 6.919Mi 6.976Mi +0.82% │ before.txt │ after.txt │ │ allocs/op │ allocs/op vs base │ StorageRetrieve/10-12 76.00 ± 0% 76.00 ± 0% ~ (p=1.000 n=10) ¹ StorageRetrieve/100-12 745.0 ± 0% 759.0 ± 0% +1.88% (p=0.000 n=10) StorageRetrieve/1000-12 7.161k ± 0% 7.153k ± 0% -0.11% (p=0.000 n=10) StorageRetrieve/10000-12 70.77k ± 0% 70.58k ± 0% -0.27% (p=0.000 n=10) StorageRetrieve/100000-12 711.9k ± 0% 709.7k ± 0% -0.31% (p=0.000 n=10) StorageRetrieve/1000000-12 7.115M ± 0% 7.077M ± 0% -0.54% (p=0.000 n=10) geomean 22.93k 22.95k +0.11%

fxamacker · 2024-05-10T03:17:42Z

Looks like validation and storage health checks passed tonight with this PR added to atree migration program at:

(locally modified) https://github.com/onflow/flow-go/tree/feature/atree-inlining-cadence-v0.42

Need to take another look at logs to confirm.

fxamacker · 2024-05-13T13:43:53Z

@ramtinms @turbolent PTAL 🙏

storage.go

ramtinms

Looks good to me, only consideration is maybe just add some comments or warning about when to use this method.

turbolent

Great idea and nice work! 👏

turbolent · 2024-05-14T15:03:50Z

@fxamacker Does this also have to get ported to the 1.0 branch?

fxamacker · 2024-05-14T15:12:28Z

@fxamacker Does this also have to get ported to the 1.0 branch?

@turbolent I will port it for completeness, but it is only needed if we use the optimization for Cadence 1.0 migration without atree inlining.

fxamacker added enhancement New feature or request performance labels May 9, 2024

fxamacker self-assigned this May 9, 2024

fxamacker requested review from ramtinms and turbolent as code owners May 9, 2024 14:16

fxamacker force-pushed the fxamacker/add-nondeterministic-fast-commit branch from cfb364c to 7162eab Compare May 9, 2024 15:01

fxamacker force-pushed the fxamacker/add-batch-preload branch 2 times, most recently from 9b5be88 to d6baae9 Compare May 9, 2024 17:33

fxamacker force-pushed the fxamacker/add-batch-preload branch from d6baae9 to a459d96 Compare May 9, 2024 18:11

ramtinms reviewed May 13, 2024

View reviewed changes

storage.go Outdated Show resolved Hide resolved

ramtinms approved these changes May 13, 2024

View reviewed changes

Add more comments in BatchPreload

aa2ee90

fxamacker force-pushed the fxamacker/add-batch-preload branch from f521009 to aa2ee90 Compare May 13, 2024 20:37

turbolent approved these changes May 13, 2024

View reviewed changes

fxamacker changed the base branch from fxamacker/add-nondeterministic-fast-commit to feature/array-map-inlining May 14, 2024 14:17

fxamacker merged commit 73e00ec into feature/array-map-inlining May 14, 2024
3 checks passed

This was referenced May 14, 2024

Add BatchPreload to decode slabs in parallel and cache (for branch without atree inlining) #407

Merged

Create optimized commit and preload for migrations #394

Closed

fxamacker mentioned this pull request Sep 27, 2024

Add Storage.NondeterministicCommit for faster migrations onflow/cadence#3348

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BatchPreload to decode slabs in parallel and cache #404

Add BatchPreload to decode slabs in parallel and cache #404

fxamacker commented May 9, 2024

fxamacker commented May 10, 2024

fxamacker commented May 13, 2024

ramtinms left a comment

turbolent left a comment

turbolent commented May 14, 2024

fxamacker commented May 14, 2024

Add BatchPreload to decode slabs in parallel and cache #404

Add BatchPreload to decode slabs in parallel and cache #404

Conversation

fxamacker commented May 9, 2024

fxamacker commented May 10, 2024

fxamacker commented May 13, 2024

ramtinms left a comment

Choose a reason for hiding this comment

turbolent left a comment

Choose a reason for hiding this comment

turbolent commented May 14, 2024

fxamacker commented May 14, 2024