-
Notifications
You must be signed in to change notification settings - Fork 720
feat: improve block selection in stacks-inspect
#6735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
feat: improve block selection in stacks-inspect
#6735
Conversation
Improves the UX by merging `validate-naka-block` and `validate-block,` improves the lookup time by optimizing the queries, and improves the output for a better experience. Adds `--early-exit` flag to exit on first error instead of completing all blocks.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6735 +/- ##
===========================================
+ Coverage 70.05% 71.95% +1.89%
===========================================
Files 578 579 +1
Lines 358373 360160 +1787
===========================================
+ Hits 251064 259140 +8076
+ Misses 107309 101020 -6289
... and 390 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
| fn collect_block_entries_for_selection( | ||
| db_path: &str, | ||
| selection: &BlockSelection, | ||
| chainstate: &StacksChainState, | ||
| ) -> Vec<BlockScanEntry> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sanity check: isn't this method currently returning 2n blocks when called with BlockSelection::Last and BlockSelection::First? seems like it will get the first N and last N blocks from both tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I decided to just remove first, since that doesn't really seem like a useful scenario any way, and then fixed last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to just remove
first, since that doesn't really seem like a useful scenario any way
And if someone needs it after all, they can use a height range starting at 0.
contrib/stacks-inspect/src/lib.rs
Outdated
| panic!("Failed to open staging blocks DB at {staging_blocks_db_path}: {e}"); | ||
| }); | ||
| let sql = format!( | ||
| "SELECT index_block_hash, consensus_hash, anchored_block_hash, height FROM staging_blocks {clause}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since we are only consuming index_block_hash, maybe we can simplity the query as SELECT index_block_hash FROM staging_blocks {clause}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had that there originally because I needed other fields when I was also reusing the new structure in other places, but then ended up deleting that code. I'll delete these now.
contrib/stacks-inspect/src/lib.rs
Outdated
| for (i, index_block_hash) in index_block_hashes.iter().enumerate() { | ||
| if i % 100 == 0 { | ||
| println!("Checked {i}..."); | ||
| let sql = format!("SELECT index_block_hash, height FROM nakamoto_staging_blocks {clause}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since we are only consuming index_block_hash, maybe we can simplity the query as SELECT index_block_hash FROM nakamoto_staging_blocks {clause}
contrib/stacks-inspect/src/lib.rs
Outdated
| let mut seen = HashSet::new(); | ||
| let mut entries = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suuuuper nit: These two variables hold over 5 million index_block_hash values each when the function is called with BlockSelection::All, stored as 64 bytes hex strings. ~320 MB for each variable (plus additional overhead from the data structures). It's not a concern right now, but we could reduce memory usage by about half if we move the
let index_block_hash = StacksBlockId::from_hex(&index_block_hash_hex).unwrap();processing inside this function and avoid keeping the hex strings around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as always, feel free to skip the changes for the nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. Also, a good reason to just remove the seen tracking altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Remove the `seen` tracking since it doesn't really gain us much and uses a lot of space when processing many blocks.
| if start > end { | ||
| return Err("<start-block> must be <= <end-block>".into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to keep end exclusive (see other comment), start == end wouldn't make much sense (it would always be zero blocks), so it would indicate a likely mistake. So in that case maybe we should do
| if start > end { | |
| return Err("<start-block> must be <= <end-block>".into()); | |
| if start >= end { | |
| return Err("<start-block> must be < <end-block>".into()); |
instead?
| let blocks = end.saturating_sub(*start); | ||
| format!("WHERE orphaned = 0 ORDER BY index_block_hash ASC LIMIT {start}, {blocks}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes end exclusive. We probably want it inclusive for consistency with HeightRange?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually even worse, I don't think this even works?
Say the user wants blocks 10 through 20. If there are 10 nakamoto blocks with a hash that is smaller than any eopch2 block, then the first block in the overall result set is the first epoch2 block. But because our OFFSET was 10, we never retrieved that block.
| next_staging_block.commit_burn, | ||
| next_staging_block.sortition_burn, | ||
| ); | ||
| Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you meant to change replay_block to return a Result<,> and then return that result here?
Returning an unconditional Ok(()) regardless of what happens in replay_block (if it even ever returns -- it has some process::exits in there) doesn't seem like it's the intent.
Improves the UX by merging
validate-naka-blockandvalidate-block,improves the lookup time by optimizing the queries, and improves the output for a better experience. Adds--early-exitflag to exit on first error instead of completing all blocks.