Optimize shadow stack instruction sequences #928

Robbepop · 2024-02-06T18:53:46Z

Closes #920.

TODOs

Fuse i32.add_imm with global.set 0
Fuse global.get 0 with i32.add_imm
Fuse fused global.get 0 + i32.add_imm with i32.local_tee and global.set 0
Support fusion of i32.add with a function local constant rhs register.

codecov-commenter · 2024-02-06T20:04:43Z

Codecov Report

Attention: Patch coverage is 77.72727% with 49 lines in your changes missing coverage. Please review.

Project coverage is 80.49%. Comparing base (978a58f) to head (27264e3).
Report is 102 commits behind head on main.

Files with missing lines	Patch %	Lines
...rates/wasmi/src/engine/translator/instr_encoder.rs	72.41%	16 Missing ⚠️
crates/wasmi/src/engine/executor/instrs/global.rs	27.77%	13 Missing ⚠️
crates/wasmi/src/engine/translator/visit.rs	55.00%	9 Missing ⚠️
crates/wasmi/src/engine/executor/instrs.rs	33.33%	4 Missing ⚠️
...ates/wasmi/src/engine/translator/visit_register.rs	0.00%	3 Missing ⚠️
crates/wasmi/src/engine/translator/mod.rs	90.00%	2 Missing ⚠️
...rates/wasmi/src/engine/translator/relink_result.rs	33.33%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #928      +/-   ##
==========================================
+ Coverage   80.48%   80.49%   +0.01%     
==========================================
  Files         270      270              
  Lines       25079    25273     +194     
==========================================
+ Hits        20184    20343     +159     
- Misses       4895     4930      +35

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

These calls were necessary since previously fuse algorithms missed some important checks to filter out invalid instruction fusion with non-instruction value producers such as i32.const etc.

This fixes a bug with a niche value when rhs=i16::MIN since there is no symmetric equivalent for -i16::MIN that fits inside an i16 value. This made the optimization fail for rhs=i16::MIN. By always compiling isub as iadd with negated rhs value we avoid this situation gracefully.

Robbepop · 2024-03-14T14:15:00Z

Comment on why this has not yet been merged despite passing all CI tests and working: it introduces a lot of complexity into the translation pipelined compared to the gains we see and can verify. The runtime gains are mostly single digits and restricted to sets of benchmarks that actually make use of the shadow stack. The global_bump benchmark is heavily affected with roughly 30% performance improvement but it is also very artificial. On the flip side the improvements to memory consumption also exist but are also just single digits for practical Wasm binaries.

So all in all the question is whether the gains are worth the added complexities.

Robbepop · 2024-03-28T16:55:30Z

The miri CI job fails because of this issue: rust-lang/miri#3404

Robbepop · 2024-06-25T12:50:13Z

We might want to block this until we have multiple look-back translation feature in the Wasmi bytecode translator. This allows to get rid of intermediate optimized instructions.

Robbepop · 2024-10-04T20:33:32Z

I am closing this now since too much has happened in the code base so that a full rewrite would make more sense. However, I am very uncertain that this optimization is a clear improvement to Wasmi as a whole.

Robbepop added 9 commits February 4, 2024 13:52

add new stack pointer optimized instructions

534d556

apply clippy suggestion

05a7f26

implement execution of new instructions

874dd5f

implement i32.add_imm + global.set fusion

d697023

only fuse for global.set with index 0

c0159af

improve comment

d9321d1

add TODO comment

2f84ac1

apply rustfmt

65c4882

fix intra doc link

f8c0f57

This comment was marked as outdated.

Sign in to view

add global.get 0 + i32.add_imm fusion

157ab4d

Robbepop mentioned this pull request Feb 8, 2024

Restore CI benchmarks #933

Open

Robbepop added 17 commits February 9, 2024 11:28

extract global.set with immutable input translation

93bdbda

improve i32_add_imm_into_global_0 constructor

72c0028

add new shadow stack opts tests

0bd8253

make fuse_i32_eqz more robust

d8b126b

make cmp+branch fusion more robust

bf4bf77

remove unnecessary calls to reset_last_instr

672defb

These calls were necessary since previously fuse algorithms missed some important checks to filter out invalid instruction fusion with non-instruction value producers such as i32.const etc.

improve i32_add_imm_from_global_0 constructor

70bdd0d

adjust tests

b889902

adjust tests

38a7057

add bytecode constructor for I32AddImmInoutGlobal0

f40b1a0

add op-code fusion for I32AddImmInoutGlobal0

cc96940

Merge branch 'master' into rf-shadow-stack-opt

85b80a2

Merge branch 'master' into rf-shadow-stack-opt

d1307d1

add test for I32AddImmIntoGlobal0 with I32Sub fusion

9c1d697

fix bug with shadow stack opt for i32::MIN values of i32.sub

72993e6

improve shadow stack opt tests

42f9936

make shadow stack global.set opts work on large integers

c98eeee

Robbepop added 9 commits February 12, 2024 13:58

rename test

4f499bb

adjust tests for shadow stack opts

d8a1fe8

remove accidentally duplicated code

7f0690d

apply clippy suggestions

f8cf6c9

apply clippy suggestions (tests)

0a3b3a3

Merge branch 'master' into rf-shadow-stack-opt

d21ce29

Merge branch 'master' into rf-shadow-stack-opt

289de51

Merge branch 'master' into rf-shadow-stack-opt

f848742

Robbepop added 7 commits March 20, 2024 09:38

Merge branch 'master' into rf-shadow-stack-opt

25c2847

fix new tests after merge

ab8126a

Merge branch 'master' into rf-shadow-stack-opt

590e72f

Merge branch 'master' into rf-shadow-stack-opt

6cb3be7

Merge branch 'master' into rf-shadow-stack-opt

b0228fb

Merge branch 'master' into rf-shadow-stack-opt

2707fcf

Merge branch 'master' into rf-shadow-stack-opt

d61ba01

Robbepop added 5 commits March 28, 2024 17:55

Merge branch 'master' into rf-shadow-stack-opt

8db0b3c

Merge branch 'master' into rf-shadow-stack-opt

15cd321

Merge branch 'master' into rf-shadow-stack-opt

c2b9c0b

fix post-merge compile errors

993728a

Merge branch 'master' into rf-shadow-stack-opt

9b80c64

Robbepop added 3 commits July 3, 2024 12:35

Merge branch 'main' into rf-shadow-stack-opt

5034632

fix internal doc link

d021e82

Merge branch 'main' into rf-shadow-stack-opt

27264e3

Robbepop closed this Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize shadow stack instruction sequences #928

Optimize shadow stack instruction sequences #928

Robbepop commented Feb 6, 2024 •

edited

Loading

This comment was marked as outdated.

codecov-commenter commented Feb 6, 2024 •

edited by codecov bot

Loading

Robbepop commented Mar 14, 2024 •

edited

Loading

Robbepop commented Mar 28, 2024

Robbepop commented Jun 25, 2024

Robbepop commented Oct 4, 2024

Optimize shadow stack instruction sequences #928

Optimize shadow stack instruction sequences #928

Conversation

Robbepop commented Feb 6, 2024 • edited Loading

TODOs

This comment was marked as outdated.

codecov-commenter commented Feb 6, 2024 • edited by codecov bot Loading

Codecov Report

Robbepop commented Mar 14, 2024 • edited Loading

Robbepop commented Mar 28, 2024

Robbepop commented Jun 25, 2024

Robbepop commented Oct 4, 2024

Robbepop commented Feb 6, 2024 •

edited

Loading

codecov-commenter commented Feb 6, 2024 •

edited by codecov bot

Loading

Robbepop commented Mar 14, 2024 •

edited

Loading