Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Venom: Replace unconditional jump to small BBs with BB itself #4327

Open
Philogy opened this issue Oct 24, 2024 · 3 comments
Open

Venom: Replace unconditional jump to small BBs with BB itself #4327

Philogy opened this issue Oct 24, 2024 · 3 comments

Comments

@Philogy
Copy link
Contributor

Philogy commented Oct 24, 2024

Simple Summary

Add an optimization pass that "inlines" small basic blocks. Similar to function inlining it allows following passes to simplify things even further and remove redundancy.

For example the following code (from snekmate):

@internal
@view
def _domain_separator_v4() -> bytes32:
    """
    @dev Returns the domain separator for the current chain.
    @return bytes32 The 32-byte domain separator.
    """
    if ((self == _CACHED_SELF) and (chain.id == _CACHED_CHAIN_ID)):
        return _CACHED_DOMAIN_SEPARATOR

    return self._build_domain_separator()

Generates:

`_domain_separator_v4` in Venom IR
IRFunction: internal 2 _domain_separator_v4()_runtime
internal 2 _domain_separator_v4()_runtime:  IN=[] OUT=[68_then, 69_else] => {}
    %1 = param  <return_buffer>
    %2 = param  <return_pc>        # {%1}
    %3 = address                   # {%2, %1}
    %19 = 32
    %18 = 64
    %4 = add label %code_end, %18  # {%2, %1, %3}
    %20 = 0
    dloadbytes %20, %4, %19        # {%2, %1, %3, %4}
    %21 = 0
    %5 = mload %21                 # {%2, %1, %3}
    %6 = xor %3, %5                # {%2, %1, %5, %3}
    jnz label %68_then, label %69_else, %6 # {%2, %1, %7}

68_then:  IN=[internal 2 _domain_separator_v4()_runtime] OUT=[70_if_exit] => {}
    %8 = chainid                   # {%2, %1}
    %23 = 32
    %22 = 32
    %9 = add label %code_end, %22  # {%2, %1, %8}
    %24 = 0
    dloadbytes %24, %9, %23        # {%2, %1, %8, %9}
    %25 = 0
    %10 = mload %25                # {%2, %1, %8}
    %11 = xor %8, %10              # {%2, %1, %10, %8}
    %12 = iszero %11               # {%2, %1, %11}
    %14:1 = %12                    # {%2, %1, %12}
    jmp label %70_if_exit          # {%2, %1, %14:1}

69_else:  IN=[internal 2 _domain_separator_v4()_runtime] OUT=[70_if_exit] => {}
    %14:2 = 0                      # {%2, %1, %13}
    jmp label %70_if_exit          # {%2, %1, %14:2}

70_if_exit:  IN=[68_then, 69_else] OUT=[71_then, 73_if_exit] => {}
    %14 = phi %14:2, label %69_else, %14:1, label %68_then # {%2, %1, %14:1, %14:2}
    jnz label %73_if_exit, label %71_then, %14 # {%2, %1, %14}

71_then:  IN=[70_if_exit] OUT=[internal 2 _domain_separator_v4()_cleanup] => {}
    %27 = 32
    %26 = 0
    %15 = add label %code_end, %26 # {%2, %1}
    dloadbytes %1, %15, %27        # {%2, %15, %1}
    jmp label %internal 2 _domain_separator_v4()_cleanup # {%2}

73_if_exit:  IN=[70_if_exit] OUT=[internal 2 _domain_separator_v4()_cleanup] => {}
    %28 = 256
    invoke %28, label %internal 1 _build_domain_separator()_runtime # {%2, %1}
    %29 = 256
    %17 = mload %29                # {%2, %1}
    mstore %1, %17                 # {%2, %17, %1}
    jmp label %internal 2 _domain_separator_v4()_cleanup # {%2}

internal 2 _domain_separator_v4()_cleanup:  IN=[71_then, 73_if_exit] OUT=[] => {}
    ret %2                         # {%2}

Motivation

The optimizer pass would:

  1. Inline the jmp label %70_if_exit in 68_then, resolving %14 = %12
  2. Inline the jmp label %70_if_exit in 69_else, resolving %14 = 0

Resuling in:

Theoretical Venom IR post BB inlining pass
IRFunction: internal 2 _domain_separator_v4()_runtime
internal 2 _domain_separator_v4()_runtime:  IN=[] OUT=[68_then, 69_else] => {}
    %1 = param  <return_buffer>
    %2 = param  <return_pc>        # {%1}
    %3 = address                   # {%2, %1}
    %19 = 32
    %18 = 64
    %4 = add label %code_end, %18  # {%2, %1, %3}
    %20 = 0
    dloadbytes %20, %4, %19        # {%2, %1, %3, %4}
    %21 = 0
    %5 = mload %21                 # {%2, %1, %3}
    %6 = xor %3, %5                # {%2, %1, %5, %3}
    jnz label %68_then, label %69_else, %6 # {%2, %1, %7}

68_then:  IN=[internal 2 _domain_separator_v4()_runtime] OUT=[71_then, 73_if_exit] => {}
    %8 = chainid                   # {%2, %1}
    %23 = 32
    %22 = 32
    %9 = add label %code_end, %22  # {%2, %1, %8}
    %24 = 0
    dloadbytes %24, %9, %23        # {%2, %1, %8, %9}
    %25 = 0
    %10 = mload %25                # {%2, %1, %8}
    %11 = xor %8, %10              # {%2, %1, %10, %8}
    %12 = iszero %11               # {%2, %1, %11}
    %14:1 = %12                    # {%2, %1, %12}
-     jmp label %70_if_exit          # {%2, %1, %14:2}
+     jnz label %73_if_exit, label %71_then, %14:1 # {%2, %1, %14}

69_else:  IN=[internal 2 _domain_separator_v4()_runtime] OUT=[71_then, 73_if_exit] => {}
    %14:2 = 0                      # {%2, %1, %13}
-     jmp label %70_if_exit          # {%2, %1, %14:2}
+     jnz label %73_if_exit, label %71_then, %14:2 # {%2, %1, %14}

70_if_exit:  IN=[68_then, 69_else] OUT=[71_then, 73_if_exit] => {}
    %14 = phi %14:2, label %69_else, %14:1, label %68_then # {%2, %1, %14:1, %14:2}
    jnz label %73_if_exit, label %71_then, %14 # {%2, %1, %14}

71_then:  IN=[70_if_exit] OUT=[internal 2 _domain_separator_v4()_cleanup] => {}
    %27 = 32
    %26 = 0
    %15 = add label %code_end, %26 # {%2, %1}
    dloadbytes %1, %15, %27        # {%2, %15, %1}
    jmp label %internal 2 _domain_separator_v4()_cleanup # {%2}

73_if_exit:  IN=[70_if_exit] OUT=[internal 2 _domain_separator_v4()_cleanup] => {}
    %28 = 256
    invoke %28, label %internal 1 _build_domain_separator()_runtime # {%2, %1}
    %29 = 256
    %17 = mload %29                # {%2, %1}
    mstore %1, %17                 # {%2, %17, %1}
    jmp label %internal 2 _domain_separator_v4()_cleanup # {%2}

internal 2 _domain_separator_v4()_cleanup:  IN=[71_then, 73_if_exit] OUT=[] => {}
    ret %2                         # {%2}

Then it would be trivial for further optimization passes to:

  • remove the no longer used 70_if_exit block
  • transform the new jnz label %73_if_exit, label %71_then, 0 to jmp label &71_then in 69_else
  • Replace 69_else with 71_then in the function entry block (because it would be an empty BB only containing a single unconditional jump)

This would lead to smaller code and remove a lot of redundant jumps and stack scheduling. Some heuristics need to be added to such a pass to avoid unsound optimization, specifically if the target basic block has any phi nodes who's lifetime extends outside of that basic block it cannot be trivially inlined. Furthermore inlining large target basic blocks could lead to code size regressions.

The 69_else and 70_if_exit blocks are quite redundant and a common pattern arising from the short-circuiting of or and and operations as well as other control-flow related to if-statements. Furthermore it would allow optimizing for loops with the rough common structure of:

cond:
    <cond>
jumpi <exit>
body:
    <body>
jump cond:
exit:    

To:

cond:
    <cond>
jumpi <exit>
body:
    <body>
    <not_cond>
jumpi body:
exit:    

By inlining the for-loop condition checking basic block at the end of the loop, turning the 2 branch operations per loop iteration to 1.

Copyright

Copyright and related rights waived via CC0

@Philogy Philogy added the needs triage needs triage label Oct 24, 2024
@charles-cooper
Copy link
Member

yea, i think this looks like a neat idea. we will need a heuristic for when to inline. maybe this can be done as part of SimplifyCFG? right now we have the invariant that basic blocks do not simultaneously have multiple cfg_in and multiple cfg_out (NormalizationPass). i assume we can get rid of the invariant actually, but it might need some rework in venom_to_assembly.py.

paging @harkal, what do you think?

@Philogy do you want to take a stab at this?

@charles-cooper
Copy link
Member

fwiw, one of those jumps gets eliminated

-f bb_runtime
IRFunction: internal 1 _domain_separator_v4()_runtime
internal 1 _domain_separator_v4()_runtime:  IN=[] OUT=[6_then, 7_else] => {%2, %1}
    %1 = param  <return_buffer>
    %2 = param  <return_pc>        # {%1}
    %3 = address                   # {%2, %1}
    %19 = 32                       # {%2, %1, %3}
    %4 = offset label %code_end, 64 # {%2, %1, %3, %19}
    %20 = %4                       # {%2, %1, %3, %19, %4}
    %21 = 0                        # {%2, %1, %3, %19, %20}
    dloadbytes %21, %20, %19       # {%2, %1, %3, %19, %20, %21}
    %22 = 0                        # {%2, %1, %3}
    %5 = mload %22                 # {%2, %1, %3, %22}
    %23 = %5                       # {%2, %1, %3, %5}
    %24 = %3                       # {%2, %1, %23, %3}
    %6 = xor %24, %23              # {%2, %1, %23, %24}
    %25 = %6                       # {%2, %1, %6}
    jnz label %6_then, label %7_else, %25 # {%2, %1, %25}

6_then:  IN=[internal 1 _domain_separator_v4()_runtime] OUT=[8_if_exit] => {%2, %1, %14:1}
    %8 = chainid                   # {%2, %1}
    %26 = 32                       # {%2, %1, %8}
    %9 = offset label %code_end, 32 # {%2, %1, %8, %26}
    %27 = %9                       # {%2, %1, %8, %26, %9}
    %28 = 0                        # {%2, %1, %8, %26, %27}
    dloadbytes %28, %27, %26       # {%2, %1, %8, %26, %27, %28}
    %29 = 0                        # {%2, %1, %8}
    %10 = mload %29                # {%2, %1, %8, %29}
    %30 = %10                      # {%2, %1, %8, %10}
    %31 = %8                       # {%2, %1, %30, %8}
    %11 = xor %31, %30             # {%2, %1, %30, %31}
    %32 = %11                      # {%2, %1, %11}
    %12 = iszero %32               # {%2, %1, %32}
    %14:1 = %12                    # {%2, %1, %12}
    jmp label %8_if_exit           # {%2, %1, %14:1}

7_else:  IN=[internal 1 _domain_separator_v4()_runtime] OUT=[8_if_exit] => {%2, %1, %14:2}
    %14:2 = 0                      # {%2, %1}
    jmp label %8_if_exit           # {%2, %1, %14:2}

8_if_exit:  IN=[6_then, 7_else] OUT=[9_then, 11_if_exit] => {%2, %1}
    %14 = phi %14:2, label %7_else, %14:1, label %6_then # {%2, %1, %14:1, %14:2}
    %33 = %14                      # {%2, %1, %14}
    jnz label %11_if_exit, label %9_then, %33 # {%2, %1, %33}

9_then:  IN=[8_if_exit] OUT=[internal 1 _domain_separator_v4()_cleanup] => {%2}
    %34 = 32                       # {%2, %1}
    %15 = offset label %code_end, 0 # {%2, %34, %1}
    %35 = %15                      # {%2, %34, %1, %15}
    %36 = %1                       # {%2, %34, %35, %1}
    dloadbytes %36, %35, %34       # {%2, %34, %35, %36}
    jmp label %internal 1 _domain_separator_v4()_cleanup # {%2}

11_if_exit:  IN=[8_if_exit] OUT=[internal 1 _domain_separator_v4()_cleanup] => {%2}
    %37 = 256                      # {%2, %1}
    invoke %37, label %internal 0 _build_domain_separator()_runtime # {%2, %1, %37}
    %38 = 256                      # {%2, %1}
    %18 = mload %38                # {%2, %1, %38}
    %39 = %18                      # {%2, %1, %18}
    %40 = %1                       # {%2, %39, %1}
    mstore %40, %39                # {%2, %39, %40}
    jmp label %internal 1 _domain_separator_v4()_cleanup # {%2}

internal 1 _domain_separator_v4()_cleanup:  IN=[9_then, 11_if_exit] OUT=[] => {}
    %41 = %2                       # {%2}
    ret %41                        # {%41}

and the corresponding assembly:

_sym_internal 0 _build_domain_separator()_runtime JUMPDEST SWAP1 PUSH32
0x8b73c3c69bb8fe3d512ecc4cf759cc79239f7b179b0ffacaa9a75d522b39400f
PUSH1 0x60 MSTORE PUSH1 0x20 _OFST _sym_code_end 192 PUSH1 0x80
CODECOPY PUSH1 0x20 _OFST _sym_code_end 288 PUSH1 0xa0 CODECOPY
CHAINID PUSH1 0xc0 MSTORE ADDRESS PUSH1 0xe0 MSTORE PUSH1 0xa0
PUSH1 0x40 MSTORE PUSH1 0x40 MLOAD PUSH1 0x60 SHA3 SWAP1 MSTORE
JUMP _sym_internal 1 _domain_separator_v4()_runtime JUMPDEST SWAP1
ADDRESS PUSH1 0x20 _OFST _sym_code_end 64 PUSH0 CODECOPY PUSH0 MLOAD
XOR _sym_7_else JUMPI CHAINID PUSH1 0x20 _OFST _sym_code_end 32
PUSH0 CODECOPY PUSH0 MLOAD XOR ISZERO _sym_8_if_exit JUMPDEST ISZERO
_sym_11_if_exit JUMPI PUSH1 0x20 SWAP1 _OFST _sym_code_end 0 SWAP1
CODECOPY _sym_internal 1 _domain_separator_v4()_cleanup JUMPDEST JUMP
_sym_11_if_exit JUMPDEST PUSH2 0x0100 _sym_label_ret_3 _sym_internal
0 _build_domain_separator()_runtime JUMP _sym_label_ret_3
JUMPDEST PUSH2 0x0100 MLOAD SWAP1 MSTORE _sym_internal 1
_domain_separator_v4()_cleanup JUMP _sym_7_else JUMPDEST PUSH0
_sym_8_if_exit JUMP

@charles-cooper
Copy link
Member

the tradeoff here looks like two JUMPIs terminating the basic blocks vs one basic block with no ending JUMP but one with two JUMPs:

Screenshot from 2024-10-29 16-39-07

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
@charles-cooper @Philogy and others