Skip to content

Get list of bb start/end eas for loops in extract.py#1253

Open
kunalsz wants to merge 1 commit into
mandiant:masterfrom
kunalsz:bb-start-end-extract
Open

Get list of bb start/end eas for loops in extract.py#1253
kunalsz wants to merge 1 commit into
mandiant:masterfrom
kunalsz:bb-start-end-extract

Conversation

@kunalsz
Copy link
Copy Markdown
Contributor

@kunalsz kunalsz commented Apr 2, 2026

In reference to the TODO in extract.py:

        if len(comp) >= 2:
            # TODO get list of bb start/end eas
            yield Loop(comp)
  • Added loop BB range extraction for each SCC loop (len(comp) >= 2), it now builds sorted (start_ea, end_ea) pairs from function basic blocks and passes them to Loop.
  • Extended Loop in features.py to keep bb_ranges while preserving existing comp behavior for compatibility

PS. main.py and tests/test_load.py got added due to black formatting, they have nothing to do with this PR

Signed-off-by: kunalsz <kunalavengers@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the loop feature extraction by capturing basic block ranges and includes minor formatting adjustments to textwrap.dedent calls. The review feedback suggests optimizing the implementation by lazily initializing the basic block lookup dictionary only when a loop is identified, which prevents unnecessary computation in functions without loops.

Comment thread floss/features/extract.py
Comment on lines 293 to +294
edges = []
bb_by_va = {bb.va: bb for bb in f.basic_blocks}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The dictionary bb_by_va is created for every function, even those without loops. Since most functions do not contain loops (SCCs of size >= 2), this is an unnecessary overhead. It is more efficient to initialize this dictionary lazily only when a loop is detected.

Suggested change
edges = []
bb_by_va = {bb.va: bb for bb in f.basic_blocks}
edges = []

Comment thread floss/features/extract.py
Comment on lines 315 to +325
for comp in comps:
if len(comp) >= 2:
# TODO get list of bb start/end eas
yield Loop(comp)
loop_bb_ranges = []
for bb_va in sorted(comp):
bb = bb_by_va.get(bb_va)
if bb is None:
continue

loop_bb_ranges.append((bb.va, bb.va + bb.size))

yield Loop(comp, bb_ranges=loop_bb_ranges)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Implementing lazy initialization for bb_by_va here ensures that the dictionary is only constructed when at least one loop is identified in the function, avoiding unnecessary computation for the majority of functions.

    bb_by_va = None
    for comp in comps:
        if len(comp) >= 2:
            if bb_by_va is None:
                bb_by_va = {bb.va: bb for bb in f.basic_blocks}
            loop_bb_ranges = []
            for bb_va in sorted(comp):
                bb = bb_by_va.get(bb_va)
                if bb is None:
                    continue

                loop_bb_ranges.append((bb.va, bb.va + bb.size))

            yield Loop(comp, bb_ranges=loop_bb_ranges)

Comment thread floss/main.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this please

Comment thread tests/test_load.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this

@williballenthin
Copy link
Copy Markdown
Collaborator

Extended Loop in features.py to keep bb_ranges while preserving existing comp behavior for compatibility

This is not the right way to go about this. Features describe what people are looking for, so putting the addresses of found loops there doesn't make any sense.

When features are extracted, they're associated with a list of addresses where the feature was found. This is probably the right place to yield this information. Although to be honest, I'm not sure if it's worth the overhead of tracking this information because I can't really imagine many scenarios where people will care about the loop locations. That's why we haven't yet addressed this comment in the source code

@williballenthin
Copy link
Copy Markdown
Collaborator

I consider addressing this to-do issue as fairly low value. So I am not willing to go back and forth many times refining a solution. I'm willing to review perhaps one or maybe two more revisions of this PR. Otherwise we'll close it out and address the TODO sometime in the future if it becomes important.

@kunalsz
Copy link
Copy Markdown
Contributor Author

kunalsz commented Apr 2, 2026

Extended Loop in features.py to keep bb_ranges while preserving existing comp behavior for compatibility

This is not the right way to go about this. Features describe what people are looking for, so putting the addresses of found loops there doesn't make any sense.

When features are extracted, they're associated with a list of addresses where the feature was found. This is probably the right place to yield this information. Although to be honest, I'm not sure if it's worth the overhead of tracking this information because I can't really imagine many scenarios where people will care about the loop locations. That's why we haven't yet addressed this comment in the source code

@williballenthin So should I close the PR ? As it wont be a meaningful contribution if we revert the Loop feature change and keep the code in just extract.py

@williballenthin
Copy link
Copy Markdown
Collaborator

So should I close the PR ? As it wont be a meaningful contribution if we

This isn't how I think about the project. I'm not looking for "meaningful contributions" but whether or not the project and its code is improved.

@kunalsz
Copy link
Copy Markdown
Contributor Author

kunalsz commented Apr 2, 2026

I consider addressing this to-do issue as fairly low value. So I am not willing to go back and forth many times refining a solution. I'm willing to review perhaps one or maybe two more revisions of this PR. Otherwise we'll close it out and address the TODO sometime in the future if it becomes important.

I noticed quite a few other TODOs in the codebase. Since this one was of a low value, do you have any higher value TODOs you’d prefer I work on instead? I’d be happy to pick one that is more useful for the project

@kunalsz
Copy link
Copy Markdown
Contributor Author

kunalsz commented Apr 2, 2026

So should I close the PR ? As it wont be a meaningful contribution if we

This isn't how I think about the project. I'm not looking for "meaningful contributions" but whether or not the project and its code is improved.

Understood ! What I meant was that I only added that change to support the loop location feature, so if that feature is not worth keeping, the changes in extract.py will also not improve codebase (atleast I can't think of any rn)😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants