Detect and merge tail blocks, preserve global-scope symbols#10
Open
freeqaz wants to merge 2 commits intorjkiv:mainfrom
Open
Detect and merge tail blocks, preserve global-scope symbols#10freeqaz wants to merge 2 commits intorjkiv:mainfrom
freeqaz wants to merge 2 commits intorjkiv:mainfrom
Conversation
ad084e7 to
a02abfb
Compare
Add tail block detection to XEX function analysis. When pdata reports a function end but disassembly reveals out-of-line code after it (small blocks with backward branches + blr), these are merged back into the preceding function rather than treated as separate symbols. Additionally, skip merging when the candidate has a global-scope symbol (from user symbols.txt, PDB, or map file), since these represent intentionally defined functions that should not be absorbed. This prevents symbols.txt regeneration from dropping user-defined functions that happen to look like tail blocks.
- Extract MAX_TAIL_BLOCK_BYTES constant and helper functions (is_unconditional_blr, branch_into_range) - Split check_tail_block into three methods: dispatcher, check_tail_block_backward_branch (case 1), and check_tail_block_scan_block (case 2) - Optimize merge_tail_blocks to collect only (addr, end) tuples instead of cloning full FunctionInfo with slices - Replace unwrap() with expect() for better panic messages - Add 18 unit tests in separate cfa_tests.rs covering: helper functions, check_tail_block cases, merge_tail_blocks merging/skipping, apply() symbol deletion and size extension, global-scope preservation
rjkiv
requested changes
Feb 13, 2026
| pub known_sections: BTreeMap<SectionIndex, String>, | ||
| /// Functions that were merged as tail blocks into their predecessors. | ||
| /// These need to be removed from obj.symbols during apply(). | ||
| pub merged_tail_blocks: Vec<SectionAddress>, |
Owner
There was a problem hiding this comment.
Instead of tracking functions with false tail calls, you should augment the fn check_tail_call in src/analysis/slices.rs.
Owner
|
I took a look at DC3 after pushing my CFA fixes, and the "check tail call" logic turned out to be correct, so Claude was right in the sense that there needs to be a way to glue two halves of a falsely-tail-call-splitted function together. Update this PR with the latest changes from main and I'll give this another look |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
blr), these are merged back into the preceding function rather than treated as separate symbolsassert_eqon function end addresses to allow detected ends beyond pdata-reported ends (since tail blocks extend past pdata boundaries)symbols.txt, PDB, or map file), preserving intentionally defined functions that happen to look like tail blockswrite_symbols()would drop user-defined symbols on every re-split because merged tail blocks getNoWriteflagsExample:
Curl_resolv_timeoutThe MSVC compiler placed
Curl_resolv_timeout(7 instructions, 28 bytes) immediately afterCurl_resolvwith a tail call (b, notbl) back into it. Without this change, dtk merges the two andCurl_resolv_timeoutdisappears fromsymbols.txton every split, preventing independent comparison in objdiff.With this change and
scope:globalin symbols.txt:Curl_resolv: 94.5% → 99.9% (remaining 2 diffs are relocation encoding, unfixable)Curl_resolv_timeout: 100% matchTest plan
cargo build --release— compiles clean