Get list of bb start/end eas for loops in extract.py#1253
Conversation
Signed-off-by: kunalsz <kunalavengers@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request enhances the loop feature extraction by capturing basic block ranges and includes minor formatting adjustments to textwrap.dedent calls. The review feedback suggests optimizing the implementation by lazily initializing the basic block lookup dictionary only when a loop is identified, which prevents unnecessary computation in functions without loops.
| edges = [] | ||
| bb_by_va = {bb.va: bb for bb in f.basic_blocks} |
There was a problem hiding this comment.
The dictionary bb_by_va is created for every function, even those without loops. Since most functions do not contain loops (SCCs of size >= 2), this is an unnecessary overhead. It is more efficient to initialize this dictionary lazily only when a loop is detected.
| edges = [] | |
| bb_by_va = {bb.va: bb for bb in f.basic_blocks} | |
| edges = [] |
| for comp in comps: | ||
| if len(comp) >= 2: | ||
| # TODO get list of bb start/end eas | ||
| yield Loop(comp) | ||
| loop_bb_ranges = [] | ||
| for bb_va in sorted(comp): | ||
| bb = bb_by_va.get(bb_va) | ||
| if bb is None: | ||
| continue | ||
|
|
||
| loop_bb_ranges.append((bb.va, bb.va + bb.size)) | ||
|
|
||
| yield Loop(comp, bb_ranges=loop_bb_ranges) |
There was a problem hiding this comment.
Implementing lazy initialization for bb_by_va here ensures that the dictionary is only constructed when at least one loop is identified in the function, avoiding unnecessary computation for the majority of functions.
bb_by_va = None
for comp in comps:
if len(comp) >= 2:
if bb_by_va is None:
bb_by_va = {bb.va: bb for bb in f.basic_blocks}
loop_bb_ranges = []
for bb_va in sorted(comp):
bb = bb_by_va.get(bb_va)
if bb is None:
continue
loop_bb_ranges.append((bb.va, bb.va + bb.size))
yield Loop(comp, bb_ranges=loop_bb_ranges)There was a problem hiding this comment.
revert this please
This is not the right way to go about this. Features describe what people are looking for, so putting the addresses of found loops there doesn't make any sense. When features are extracted, they're associated with a list of addresses where the feature was found. This is probably the right place to yield this information. Although to be honest, I'm not sure if it's worth the overhead of tracking this information because I can't really imagine many scenarios where people will care about the loop locations. That's why we haven't yet addressed this comment in the source code |
|
I consider addressing this to-do issue as fairly low value. So I am not willing to go back and forth many times refining a solution. I'm willing to review perhaps one or maybe two more revisions of this PR. Otherwise we'll close it out and address the TODO sometime in the future if it becomes important. |
@williballenthin So should I close the PR ? As it wont be a meaningful contribution if we revert the |
This isn't how I think about the project. I'm not looking for "meaningful contributions" but whether or not the project and its code is improved. |
I noticed quite a few other TODOs in the codebase. Since this one was of a low value, do you have any higher value TODOs you’d prefer I work on instead? I’d be happy to pick one that is more useful for the project |
Understood ! What I meant was that I only added that change to support the loop location feature, so if that feature is not worth keeping, the changes in |
In reference to the TODO in
extract.py:(len(comp) >= 2), it now builds sorted(start_ea, end_ea)pairs from function basic blocks and passes them to Loop.Loopinfeatures.pyto keepbb_rangeswhile preserving existing comp behavior for compatibilityPS.
main.pyandtests/test_load.pygot added due to black formatting, they have nothing to do with this PR