Skip to content

Detect and merge tail blocks, preserve global-scope symbols#10

Open
freeqaz wants to merge 2 commits intorjkiv:mainfrom
freeqaz:fix/preserve-global-tail-blocks
Open

Detect and merge tail blocks, preserve global-scope symbols#10
freeqaz wants to merge 2 commits intorjkiv:mainfrom
freeqaz:fix/preserve-global-tail-blocks

Conversation

@freeqaz
Copy link

@freeqaz freeqaz commented Feb 3, 2026

Summary

  • Adds tail block detection to XEX function analysis: when pdata reports a function end but disassembly reveals out-of-line code after it (small blocks <=64 bytes with backward branches + blr), these are merged back into the preceding function rather than treated as separate symbols
  • Relaxes the strict assert_eq on function end addresses to allow detected ends beyond pdata-reported ends (since tail blocks extend past pdata boundaries)
  • Skips merging when the candidate function has a global-scope symbol (from user symbols.txt, PDB, or map file), preserving intentionally defined functions that happen to look like tail blocks
  • Without the global-scope guard, write_symbols() would drop user-defined symbols on every re-split because merged tail blocks get NoWrite flags

Example: Curl_resolv_timeout

The MSVC compiler placed Curl_resolv_timeout (7 instructions, 28 bytes) immediately after Curl_resolv with a tail call (b, not bl) back into it. Without this change, dtk merges the two and Curl_resolv_timeout disappears from symbols.txt on every split, preventing independent comparison in objdiff.

With this change and scope:global in symbols.txt:

  • Curl_resolv: 94.5% → 99.9% (remaining 2 diffs are relocation encoding, unfixable)
  • Curl_resolv_timeout: 100% match

Test plan

  • Build with cargo build --release — compiles clean
  • Re-split a project with known tail blocks — verify merged tail blocks still get merged
  • Add a global-scope symbol at a tail block address — verify it survives re-split
  • Run full report and diff against previous — no regressions

@freeqaz freeqaz force-pushed the fix/preserve-global-tail-blocks branch from ad084e7 to a02abfb Compare February 3, 2026 07:03
Add tail block detection to XEX function analysis. When pdata reports
a function end but disassembly reveals out-of-line code after it
(small blocks with backward branches + blr), these are merged back
into the preceding function rather than treated as separate symbols.

Additionally, skip merging when the candidate has a global-scope
symbol (from user symbols.txt, PDB, or map file), since these
represent intentionally defined functions that should not be absorbed.
This prevents symbols.txt regeneration from dropping user-defined
functions that happen to look like tail blocks.
- Extract MAX_TAIL_BLOCK_BYTES constant and helper functions
  (is_unconditional_blr, branch_into_range)
- Split check_tail_block into three methods: dispatcher,
  check_tail_block_backward_branch (case 1), and
  check_tail_block_scan_block (case 2)
- Optimize merge_tail_blocks to collect only (addr, end) tuples
  instead of cloning full FunctionInfo with slices
- Replace unwrap() with expect() for better panic messages
- Add 18 unit tests in separate cfa_tests.rs covering: helper
  functions, check_tail_block cases, merge_tail_blocks
  merging/skipping, apply() symbol deletion and size extension,
  global-scope preservation
pub known_sections: BTreeMap<SectionIndex, String>,
/// Functions that were merged as tail blocks into their predecessors.
/// These need to be removed from obj.symbols during apply().
pub merged_tail_blocks: Vec<SectionAddress>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of tracking functions with false tail calls, you should augment the fn check_tail_call in src/analysis/slices.rs.

@rjkiv
Copy link
Owner

rjkiv commented Feb 22, 2026

I took a look at DC3 after pushing my CFA fixes, and the "check tail call" logic turned out to be correct, so Claude was right in the sense that there needs to be a way to glue two halves of a falsely-tail-call-splitted function together. Update this PR with the latest changes from main and I'll give this another look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants