Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Memory Regions to IR #231

Merged
merged 73 commits into from
Oct 30, 2024
Merged

Adding Memory Regions to IR #231

merged 73 commits into from
Oct 30, 2024

Conversation

yousifpatti
Copy link
Contributor

@yousifpatti yousifpatti commented Aug 13, 2024

Before merging there are two main issues

  • Some initial regions of a function are ignored (they are not collected by MRA) and thus are not renamed. This may have been fixed in the MRA file of the other branch but has other dependencies that needs merging.

  • Some Data section regions that are in initial memory and read only memory are not found. (Those are not loaded by the ReadELFLoader and cause this issue as MMM relies on results from the loader. Notable mentions for "arrays" example:
    [WARN] No region found for memory section 0 [[email protected]:297]
    [WARN] No region found for memory section 0 [[email protected]:297]
    [WARN] No region found for memory section 0 [[email protected]:297]
    [WARN] No region found for memory section 0 [[email protected]:297]
    [WARN] No region found for memory section 568 [[email protected]:297]
    [WARN] No region found for memory section 596 [[email protected]:297]
    [WARN] No region found for memory section 632 [[email protected]:297]
    [WARN] No region found for memory section 664 [[email protected]:297]
    [WARN] No region found for memory section 696 [[email protected]:297]
    [WARN] No region found for memory section 960 [[email protected]:297]
    [WARN] No region found for memory section 1148 [[email protected]:297]
    [WARN] No region found for memory section 1176 [[email protected]:297]
    [WARN] No region found for memory section 1240 [[email protected]:297]
    [WARN] No region found for memory section 1528 [[email protected]:297]

Anything bigger than these offsets is resolved. These offsets dont seem to be loaded with externalFunctions or globalOffsets

@yousifpatti yousifpatti self-assigned this Aug 13, 2024
@l-kent
Copy link
Contributor

l-kent commented Aug 19, 2024

Memory sections in the initial memory/read only memory that are not actually accessed in the program don't need to be given their own region, it's fine if they just use the default mem region for now. Eventually we want to go through the initial memory definitions per-procedure so that any memory definitions that relate to regions that aren't in the procedure are not included.

@l-kent
Copy link
Contributor

l-kent commented Aug 19, 2024

This may have been fixed in the MRA file of the other branch but has other dependencies that needs merging.

Which other branch are you talking about?

@yousifpatti
Copy link
Contributor Author

yousifpatti commented Aug 19, 2024

This may have been fixed in the MRA file of the other branch but has other dependencies that needs merging.

Which other branch are you talking about?

I am talking about yousif-vsa branch where I made changes to MRA. This issue has been resolved though in my latest commit here so no need to worry about that anymore.

@yousifpatti
Copy link
Contributor Author

@l-kent In that case can we look into merging the changes please?

@l-kent
Copy link
Contributor

l-kent commented Aug 19, 2024

Yes, I'm going to review this next.

@yousifpatti
Copy link
Contributor Author

yousifpatti commented Aug 20, 2024

The last commit should fix the issue with over approximating to "abort"
The issue was that abort was the last region loaded and thus any offsets after were approximated to "abort".
The other global offsets were missing "globalAddresses" variable not passed to MMM, so I have just added those offsets and it is working properly.

for (elem <- memorySegment) {
elem match {
case mem: MemorySection =>
val regions = mmm.findDataObject(mem.address)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work at present. MemorySection.address is just the address of the start of the memory section - for example .plt .got and .data are all MemorySections, and their contents are just listed as bytes, which is very crude. We want to extract the bytes relevant to any regions that the memory region analysis has determined are accessed, which requires much more work than just attempting to match the addresses like this.

Currently, all this method does is remove some data sections entirely, even if they contain addresses that we care about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the relationship between those small 8 byte addresses and the memory section? Are they loaded from the relf file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MemorySections are loaded from the .adt or .gts file and each represent a single section of the binary - such as the .data section. The address of a MemorySection is the address of the start of the section, and each byte in the array of bytes is located at a subsequent address. So MemorySection.bytes[0] is the 8-bit value stored at address MemorySection.address, MemorySection.bytes[1] is the 8-bit value stored at address MemorySection.address + 1, and so on.

I will handle this part though, since this relates to fixing the Boogie output.

@l-kent
Copy link
Contributor

l-kent commented Aug 20, 2024

Currently, the initial memory pre/post-conditions are not generated with the identified regions. An example of this is the test correct/basic_function_call_reader/clang_pic. The following snippets are from the output currently produced for this test:

procedure main();
  ...
  free requires (memory_load64_le(mem, 69568bv64) == 69688bv64);
R8, Gamma_R8 := 65536bv64, true;
call rely();
R8, Gamma_R8 := memory_load64_le(x, bvadd64(R8, 4032bv64)), (gamma_load64(Gamma_x, bvadd64(R8, 4032bv64)) || L(x, bvadd64(R8, 4032bv64)));

In the second snippet, the address 69568 (65536 + 4032) is loaded from the region x. However, in the precondition created from initialised memory in the first snippet, the default region mem is used for the address 69568. We want the regions to be aligned, so that the same region is used for both.

@l-kent
Copy link
Contributor

l-kent commented Aug 20, 2024

The rely() procedures created from the user's specification do not include the new regions. An example of this is the test correct/basicassign_gamma0/clang. The following snippets are all from the output produced for it:

axiom ($secret_addr == 69684bv64);
procedure {:extern} rely();
  modifies Gamma_mem, mem;
  ensures (forall i: bv64 :: (((mem[i] == old(mem[i])) ==> (Gamma_mem[i] == old(Gamma_mem[i])))));
  ensures (memory_load32_le(mem, $secret_addr) == old(memory_load32_le(mem, $secret_addr)));
R8, Gamma_R8 := 69632bv64, true;
call rely();
R8, Gamma_R8 := zero_extend32_32(memory_load32_le(secret, bvadd64(R8, 52bv64))), (gamma_load32(Gamma_secret, bvadd64(R8, 52bv64)) || L(secret, bvadd64(R8, 52bv64)));

In the third snippet, the address 69684 (69632 + 52) is loaded from the region secret, but in the rely() procedure, the address 69684 is loaded from the default region mem. We want this to be replaced in the rely procedure so it matches the identified region.

@l-kent
Copy link
Contributor

l-kent commented Aug 20, 2024

The test correct/initialisation/gcc demonstrates three issues - not using identified regions in the generated require/ensures clauses from the user specifications, incorrectly removing relevant sections of initialised memory, and not using the identified regions in initialised memory requires/ensures clauses. I've already discussed the latter so I'll detail the first two for this test.

On main, the following snippet is produced for this test:

procedure main();
  modifies ...
  free requires (memory_load64_le(mem, 69632bv64) == 0bv64);
  free requires (memory_load64_le(mem, 69640bv64) == 69640bv64);
  free requires (memory_load64_le(mem, 69648bv64) == 416611827717bv64);
  free requires (memory_load64_le(mem, 69656bv64) == 68719476735bv64);
  free requires (memory_load64_le(mem, 69664bv64) == 8589934593bv64);

These requires clauses are all removed from the output when using this branch's memory region analysis. However, the addresses 69648, 69656, 69664, 69668 and 69652 are all accessed, so we want to keep the initialisations for 69648, 69656 and 69664 around. We want the regions in the requires clause to match as well, so that would require splitting up the initialisations further - the region y would need the address 69652 initialised with a 32-bit access, and the region x would need the address 69648 initialised with a 32-bit access, instead of the 64-bit initialisation covering both as-is.

R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 16bv64), Gamma_R0;
call rely();
R0, Gamma_R0 := zero_extend32_32(memory_load32_le(x, R0)), (gamma_load32(Gamma_x, R0) || L(x, R0));
R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 24bv64), Gamma_R0;
call rely();
R0, Gamma_R0 := memory_load64_le(z, R0), (gamma_load64(Gamma_z, R0) || L(z, R0));
R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 32bv64), Gamma_R0;
call rely();
R0, Gamma_R0 := zero_extend32_32(memory_load32_le(a, R0)), (gamma_load32(Gamma_a, R0) || L(a, R0));
R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 32bv64), Gamma_R0;
call rely();
assert (L(a, bvadd64(R0, 4bv64)) ==> Gamma_R1);
a, Gamma_a := memory_store32_le(a, bvadd64(R0, 4bv64), R1[32:0]), gamma_store32(Gamma_a, bvadd64(R0, 4bv64), Gamma_R1);
R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 20bv64), Gamma_R0;
call rely();
R0, Gamma_R0 := zero_extend56_8(memory_load8_le(y, R0)), (gamma_load8(Gamma_y, R0) || L(y, R0));

The output also includes the following ensures clauses in the procedure main():

ensures (memory_load32_le(mem, $x_addr) == 6bv32);
ensures (memory_load32_le(mem, bvadd64($a_addr, 4bv64)) == 4bv32);
ensures (memory_load32_le(mem, bvadd64($a_addr, 0bv64)) == 1bv32);

The issue is that mem is not replaced with identified regions for these, but we want it to.

@l-kent
Copy link
Contributor

l-kent commented Aug 20, 2024

Currently, the memory regions for pointers and pointers-to-pointers are unnecessarily combined. This doesn't cause any issues at present but it is unnecessary.

An example of this is the output for the test case correct/basic_arrays_read/clang_pic:

R8, Gamma_R8 := 65536bv64, true;
call rely();
R8, Gamma_R8 := memory_load64_le(arr, bvadd64(R8, 4056bv64)), (gamma_load64(Gamma_arr, bvadd64(R8, 4056bv64)) || L(arr, bvadd64(R8, 4056bv64)));
call rely();
assert (L(arr, R8) ==> true);
arr, Gamma_arr := memory_store32_le(arr, R8, 0bv32), gamma_store32(Gamma_arr, R8, true);

The important context here is that the global variable arr is located at address 69638, and this address is stored in memory at address 69592. This means that the first memory access loads 69638 (the address of global variable arr) from address 69592 (65536 + 4056) to R8, and the second memory access writes 0 to the global variable arr.

Both accesses are given the region arr even though they are completely distinct memory locations, and it would be ideal for the analysis to be able to give these separate regions.

@l-kent
Copy link
Contributor

l-kent commented Aug 20, 2024

This PR causes some of the IndirectCallsTests to fail - for some of those tests, the indirect calls get incorrectly resolved to the global variable x or _ITM_registerTMCloneTable.

Indirect call resolution also fails for the correct/jumptable/clang (including variants) tests and for the correct/functionpointer/gcc_O2 and /clang_O2 tests.

The correct/jumptable/clang test is a particularly interesting example, and stack region identification is currently incorrect for it too. These are all the stack accesses produced in main() (in order):

#5, Gamma_#5 := bvadd64(R31, 64bv64), Gamma_R31;
// R31 + 64 (64-bit access)
stack_25, Gamma_stack_25 := memory_store64_le(stack_25, #5, R29), gamma_store64(Gamma_stack_25, #5, Gamma_R29); 

// R31 + 72 (64-bit access)
stack_25, Gamma_stack_25 := memory_store64_le(stack_25, bvadd64(#5, 8bv64), R30), gamma_store64(Gamma_stack_25, bvadd64(#5, 8bv64), Gamma_R30);

R29, Gamma_R29 := bvadd64(R31, 64bv64), Gamma_R31;

// R31 + 12 (32-bit access)
stack_15, Gamma_stack_15 := memory_store32_le(stack_15, bvadd64(R31, 12bv64), R8[32:0]), gamma_store32(Gamma_stack_15, bvadd64(R31, 12bv64), Gamma_R8);

// R31 + 60 (32-bit access)
stack_28, Gamma_stack_28 := memory_store32_le(stack_28, bvadd64(R29, 18446744073709551612bv64), 0bv32), gamma_store32(Gamma_stack_28, bvadd64(R29, 18446744073709551612bv64), true); 

 // R31 + 56 (32-bit access)
stack_28, Gamma_stack_28 := memory_store32_le(stack_28, bvadd64(R29, 18446744073709551608bv64), R0[32:0]), gamma_store32(Gamma_stack_28, bvadd64(R29, 18446744073709551608bv64), Gamma_R0);

// R31 + 48 (64-bit access)
stack_21, Gamma_stack_21 := memory_store64_le(stack_21, bvadd64(R29, 18446744073709551600bv64), R1), gamma_store64(Gamma_stack_21, bvadd64(R29, 18446744073709551600bv64), Gamma_R1); 

// R31 + 16 (128-bit access)
stack_21, Gamma_stack_21 := memory_store128_le(stack_21, bvadd64(R31, 16bv64), V0), gamma_store128(Gamma_stack_21, bvadd64(R31, 16bv64), Gamma_V0); 

// R31 + 32 (64-bit access)
stack_25, Gamma_stack_25 := memory_store64_le(stack_25, bvadd64(R31, 32bv64), R8), gamma_store64(Gamma_stack_25, bvadd64(R31, 32bv64), Gamma_R8); 

// R31 + 16 (64-bit access)
R8, Gamma_R8 := memory_load64_le(stack_21, bvadd64(R31, 16bv64)), gamma_load64(Gamma_stack_21, bvadd64(R31, 16bv64)); 

// R31 + 24 (64-bit access)
R8, Gamma_R8 := memory_load64_le(stack_28, bvadd64(R31, 24bv64)), gamma_load64(Gamma_stack_28, bvadd64(R31, 24bv64)); 

// R31 + 32 (64-bit access)
R8, Gamma_R8 := memory_load64_le(stack_25, bvadd64(R31, 32bv64)), gamma_load64(Gamma_stack_25, bvadd64(R31, 32bv64)); 

// R31 + 12 (32-bit access)
R0, Gamma_R0 := zero_extend32_32(memory_load32_le(stack_15, bvadd64(R31, 12bv64))), gamma_load32(Gamma_stack_15, bvadd64(R31, 12bv64)); 
#6, Gamma_#6 := bvadd64(R31, 64bv64), Gamma_R31;
// R31 + 64 (64-bit access)
R29, Gamma_R29 := memory_load64_le(stack_25, #6), gamma_load64(Gamma_stack_25, #6); 
// R31 + 72 (64-bit access)
R30, Gamma_R30 := memory_load64_le(stack_25, bvadd64(#6, 8bv64)), gamma_load64(Gamma_stack_25, bvadd64(#6, 8bv64)); 

Each stack region is associated with the following accesses:

stack_15: R31 + 12 (32-bit access)
stack_21: R31 + 48 (64-bit access), R31 + 16 (128-bit access), R31 + 16 (64-bit access)
stack_25: R31 + 64 (64-bit access), R31 + 72 (64-bit access), R31 + 32 (64-bit access)
stack_28: R31 + 60 (32-bit access), R31 + 56 (32-bit access), R31 + 24 (64-bit access)

It is incorrect that R31 + 16 (128-bit access) and R31 + 24 (64-bit access) are in different regions, despite being overlapping accesses - the analysis needs to consider the size of accesses as well as their address.

While the way the other regions are combined does not cause problems, it is imprecise and it would be ideal to fix it so that distinct accesses have distinct regions.

@l-kent
Copy link
Contributor

l-kent commented Aug 20, 2024

Another issue is the regions in assertions, specifically the assertions that check if writes to memory are secure.

These are updates to the global variables z and x, respectively from the test correct/secret_write/clang, with annotations:

assert (L(z, bvadd64(R9, 52bv64)) ==> true); // checking that the memory store doesn't violate security properties
z_old := memory_load32_le(z, $z_addr); // getting current value of global variable z for comparison after the store
Gamma_x_old := (gamma_load32(Gamma_z, $x_addr) || L(z, $x_addr)); // getting current value of security level of global variable x, but using the wrong region
z, Gamma_z := memory_store32_le(z, bvadd64(R9, 52bv64), 0bv32), gamma_store32(Gamma_z, bvadd64(R9, 52bv64), true); // writing to global variable z
assert ((bvadd64(R9, 52bv64) == $z_addr) ==> (L(z, $x_addr) ==> Gamma_x_old)); // checking that if this memory store was to global variable z then it hasn't violated the security policy for global variable x, which depends on global variable z, but using the wrong region as input to the L function
assert bvsge32(memory_load32_le(mem, $z_addr), z_old); // checking the guarantee, that the global variable z is only increasing, but using the wrong region
assert (L(x, bvadd64(R8, 60bv64)) ==> Gamma_R10);
z_old := memory_load32_le(x, $z_addr); // storing current value of global variable z for comparison after the store, but using incorrect region 
Gamma_x_old := (gamma_load32(Gamma_x, $x_addr) || L(x, $x_addr));  // getting current value of security level of global variable x
x, Gamma_x := memory_store32_le(x, bvadd64(R8, 60bv64), R10[32:0]), gamma_store32(Gamma_x, bvadd64(R8, 60bv64), Gamma_R10); // writing to global variable x
assert ((bvadd64(R8, 60bv64) == $z_addr) ==> (L(x, $x_addr) ==> Gamma_x_old)); // checking that if this memory store was to global variable z then it hasn't violated the security policy for global variable x, which depends on global variable z. This assertion is now redundant since we have determined that this memory store was to the region x, which does not contain the global variable z which this assertion is relevant for. 
assert bvsge32(memory_load32_le(mem, $z_addr), z_old); // checking the guarantee, that the global variable z is only increasing, but using the wrong region. This assertion is now completely redundant since we have determined that this is a write to the region x which does not contain the global variable z which this assertion is relevant for.

There are number of issues we need to address here. Most straight-forwardly - there is still a reference to mem in the last assertion generated, which checks the user-specified guarantee. We need to make sure all accesses to global variables use the correct region for each global variable.

The more difficult issue is that we need to redesign this secure update check to take into account that not all global variables share the same region now, which means many checks can be removed.

We want to change the output so that it now looks something like this:

assert (L(z, bvadd64(R9, 52bv64)) ==> true);
z_old := memory_load32_le(z, $z_addr);
Gamma_x_old := (gamma_load32(Gamma_x, $x_addr) || L(x, $x_addr));
z, Gamma_z := memory_store32_le(z, bvadd64(R9, 52bv64), 0bv32), gamma_store32(Gamma_z, bvadd64(R9, 52bv64), true);
assert ((bvadd64(R9, 52bv64) == $z_addr) ==> (L(x, $x_addr) ==> Gamma_x_old));
assert bvsge32(memory_load32_le(z, $z_addr), z_old);
assert (L(x, bvadd64(R8, 60bv64)) ==> Gamma_R10);
x, Gamma_x := memory_store32_le(x, bvadd64(R8, 60bv64), R10[32:0]), gamma_store32(Gamma_x, bvadd64(R8, 60bv64), Gamma_R10); // writing to global variable x

The differences are:

  • the correct regions are used in relation to global variables
  • assertions for guarantees are only generated if a global variable referred to in a guarantee is contained in a region accessed by a memory store
  • secure update assertions are only generated for a control variable (in this case, z is the control variable) if a region containing that control variable is accessed by a memory store

It may also make sense to split the L function so that each region has its own L function? I'm not sure if this matters at all either way though.

@l-kent
Copy link
Contributor

l-kent commented Aug 21, 2024

I think that should be pretty much everything. I realise this is a lot. I think the most important issues to focus on at first are the ones in this comment: #231 (comment) since those are issues with the analysis itself, rather than just more work that needs to be done with the analysis results.

Comment on lines 420 to 430
private def returnRegion(region: StackRegion): StackRegion = {
uf.find(region.asInstanceOf[MemoryRegion]).asInstanceOf[StackRegion]
}

private def returnRegion(region: DataRegion): DataRegion = {
uf.find(region.asInstanceOf[MemoryRegion]).asInstanceOf[DataRegion]
}

private def returnRegion(region: HeapRegion): HeapRegion = {
uf.find(region.asInstanceOf[MemoryRegion]).asInstanceOf[HeapRegion]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all the region subtypes are treated completely separately as this implies, shouldn't each region subtype have its own UnionFind to enforce this?

@l-kent
Copy link
Contributor

l-kent commented Aug 21, 2024

Another issue: the stack identification doesn't seem to work correctly when heap pointers are involved.

This is from test/correct/malloc_with_local/gcc/malloc_with_local.bir, with annotations:

0000036d: call @malloc with return %0000036f // call malloc, putting fresh heap pointer in R0

0000036f:
00000375: mem := mem with [R31 + 0x20, el]:u64 <- R0 // store fresh heap pointer on stack at R31 + 0x20
0000037a: R0 := 4
0000037f: R30 := 0x830
00000381: call @malloc with return %00000383 // call malloc, putting a second fresh heap pointer in R0

00000383:
00000389: mem := mem with [R31 + 0x28, el]:u64 <- R0 // store second heap pointer on stack at R31 + 0x28
0000038e: R0 := 0xA
00000396: mem := mem with [R31 + 0x1C, el]:u32 <- 31:0[R0] 
0000039d: R0 := mem[R31 + 0x20, el]:u64 // load first heap pointer from stack to R0
000003a2: R1 := 0x41
000003aa: mem := mem with [R0] <- 7:0[R1] // write 0x41 to first heap location, which R0 points to
000003b1: R0 := mem[R31 + 0x28, el]:u64 // load second heap pointer from stack to R0
000003b6: R1 := 0x2A
000003be: mem := mem with [R0, el]:u32 <- 31:0[R1] // write 0x2A to second heap location, which R0 points to
000003c5: R0 := mem[R31 + 0x20, el]:u64 // load first heap pointer from stack to R0
000003cc: R0 := pad:64[mem[R0]] // load value contained in first heap location to R0

The corresponding Boogie produced is the following (with annotations and cleaned up a little):

  // call malloc, putting fresh heap pointer in R0
  call malloc();  
  goto l0000036f;
l0000036f:
  // store fresh heap pointer on stack at R31 + 32 (== 0x20)
  stack_15, Gamma_stack_15 := memory_store64_le(stack_15, bvadd64(R31, 32bv64), R0), gamma_store64(Gamma_stack_15, bvadd64(R31, 32bv64), Gamma_R0);
  R0, Gamma_R0 := 4bv64, true;
  R30, Gamma_R30 := 2096bv64, true;
  // call malloc, putting a second fresh heap pointer in R0
  call malloc();
  goto l00000383;
l00000383:
  // store second heap pointer on stack at R31 + 40 (== 0x28) 
  stack_18, Gamma_stack_18 := memory_store64_le(stack_18, bvadd64(R31, 40bv64), R0), gamma_store64(Gamma_stack_18, bvadd64(R31, 40bv64), Gamma_R0); 
  R0, Gamma_R0 := 10bv64, true;
  stack_20, Gamma_stack_20 := memory_store32_le(stack_20, bvadd64(R31, 28bv64), R0[32:0]), gamma_store32(Gamma_stack_20, bvadd64(R31, 28bv64), Gamma_R0);
  // load first heap pointer from stack to R0
  R0, Gamma_R0 := memory_load64_le(stack_15, bvadd64(R31, 32bv64)), gamma_load64(Gamma_stack_15, bvadd64(R31, 32bv64)); 
  R1, Gamma_R1 := 65bv64, true;
  // write 65 (== 0x41) to first heap location, which R0 points to - incorrectly uses region 'stack_15' instead of the first heap region
  stack_15, Gamma_stack_15 := memory_store8_le(stack_15, R0, R1[8:0]), gamma_store8(Gamma_stack_15, R0, Gamma_R1);
  // load second heap pointer from stack to R0
  R0, Gamma_R0 := memory_load64_le(stack_18, bvadd64(R31, 40bv64)), gamma_load64(Gamma_stack_18, bvadd64(R31, 40bv64));
  R1, Gamma_R1 := 42bv64, true;
  // write 42 (== 0x2A) to second heap location, which R0 points to  - incorrectly uses region 'stack_18' instead of the second heap region
  stack_18, Gamma_stack_18 := memory_store32_le(stack_18, R0, R1[32:0]), gamma_store32(Gamma_stack_18, R0, Gamma_R1);
  // load first heap pointer from stack to R0
  R0, Gamma_R0 := memory_load64_le(stack_15, bvadd64(R31, 32bv64)), gamma_load64(Gamma_stack_15, bvadd64(R31, 32bv64));
  // load value contained in first heap location to R0 - incorrectly uses region 'stack_15' instead of the first heap region
  R0, Gamma_R0 := zero_extend56_8(memory_load8_le(stack_15, R0)), gamma_load8(Gamma_stack_15, R0);

The big issue is that accesses to the heap are incorrectly treated as accesses to the stack regions in which the heap pointers are stored. Every heap allocation should have its own region and should not be incorrectly identified as the stack.

@yousifpatti
Copy link
Contributor Author

I pushed up to the working things

# Conflicts:
#	src/main/scala/analysis/InterprocSteensgaardAnalysis.scala
#	src/main/scala/analysis/MemoryModelMap.scala
#	src/main/scala/ir/Expr.scala
#	src/main/scala/util/RunUtils.scala
@l-kent
Copy link
Contributor

l-kent commented Oct 29, 2024

For correct/basic_lock_read/gcc:BAP, there is an inconsistency between the GRA result and the MMM. The GRA correctly gives a region with start address 69652 (0x11014) and size 4 to accesses to that address, but this region does not appear in the MMM's dataMap. I haven't yet identified why.

This was an issue before your most recent changes but I had previously missed it.

@l-kent
Copy link
Contributor

l-kent commented Oct 29, 2024

For correct/initialisation/clang_pic:BAP, there are some region size issues in relation to array accesses as previously discussed, but there is another error in the analysis as well.

This is the relevant parts of the .bir file:

00000305: R9 := 0x10000
0000030c: R9 := mem[R9 + 0xFC8, el]:u64

0000036b: R8 := pad:64[mem[R9, el]:u32]

00000379: mem := mem with [R9 + 4, el]:u32 <- 31:0[R8]

The access at 0000036b is incorrectly given a region with a size of 8 instead of 4 (it only has a 32-bit access) - this is the same region size issue with array accesses.

The access at 00000379 is incorrectly given a region with size 8 and start address 0x10FCC (69580). It seems to have gotten this by doing 0x10000 + 0xFC8 + 4, but the fact that 0x10FC8 (69576) is loaded from memory and the result has 4 added to it has been missed. The correct address is 0x11044 (69700) as mem[0x10FC8] == 0x11040 (69696).

This was probably an issue before your most recent changes, just overshadowed by other issues you've now fixed.

@l-kent
Copy link
Contributor

l-kent commented Oct 29, 2024

For correct/secret_write/clang:BAP, there is a new error introduced by the most recent changes. The GRA correctly gives accesses to address 0x11034 (69684, which is the address of z) a region, but this region is not included in the MMM's dataMap.

@l-kent
Copy link
Contributor

l-kent commented Oct 29, 2024

For correct/functionpointer/clang:GTIRB, there is an issue with the stack regions. This is an excerpt of the output:

    R29, Gamma_R29 := bvadd64(R31, 32bv64), Gamma_R31;
    stack_9, Gamma_stack_9 := memory_store32_le(stack_9, bvadd64(R29, 18446744073709551612bv64), 0bv32), gamma_store32(Gamma_stack_9, bvadd64(R29, 18446744073709551612bv64), true);
    assume {:captureState "1872$0"} true;
    stack_20, Gamma_stack_20 := memory_store32_le(stack_20, bvadd64(R29, 18446744073709551608bv64), R0[32:0]), gamma_store32(Gamma_stack_20, bvadd64(R29, 18446744073709551608bv64), Gamma_R0);
    assume {:captureState "1876$0"} true;
    R8, Gamma_R8 := zero_extend32_32(memory_load32_le(stack_8, bvadd64(R29, 18446744073709551608bv64))), gamma_load32(Gamma_stack_8, bvadd64(R29, 18446744073709551608bv64));

The accesses to regions stack_20 and stack_8 both have the same address (R29 - 8 == R31 + 24) but have different regions. This is incorrect.

This was an issue before your most recent changes but I had missed it.

edit: This seems to only happen some of the time, so it may be some weird iteration-order issue with iterating over a Set or something?

@@ -537,15 +558,18 @@ object RunUtils {
var iteration = 1
var modified: Boolean = true
val analysisResult = mutable.ArrayBuffer[StaticAnalysisContext]()
while (modified) {
while (modified || analysisResult.size < 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a good way to make use of the VSA results, as we don't need to run every analysis twice. The way the analyses feed into each other could be significantly improved - having a circular dependency between them really doesn't seem ideal.

Comment on lines 46 to 90
private var DataMemory, HeapMemory, StackMemory = TreeMap[BigInt, Array[Byte]]()



// Store operation: store BigInt value at a BigInt address
def store(address: BigInt, value: BigInt, memoryType: MemoryType): Unit = {
val byteArray = value.toByteArray
memoryType match
case MemoryType.Data => DataMemory += (address -> byteArray)
case MemoryType.Heap => HeapMemory += (address -> byteArray)
case MemoryType.Stack => StackMemory += (address -> byteArray)
}

// Load operation: load from a BigInt address with a specific size
def load(address: BigInt, size: Int, memoryType: MemoryType): BigInt = {
val memory = memoryType match
case MemoryType.Data => DataMemory
case MemoryType.Heap => HeapMemory
case MemoryType.Stack => StackMemory
// Find the memory block that contains the starting address
val floorEntry = memory.rangeTo(address).lastOption

floorEntry match {
case Some((startAddress, byteArray)) =>
val offset = (address - startAddress).toInt // Offset within the byte array
// If the load exceeds the stored data, we need to handle padding with zeros
if (offset >= byteArray.length) {
BigInt(0)
} else {
// Calculate how much data we can retrieve
val availableSize = byteArray.length - offset
// Slice the available data, and if requested size exceeds, append zeros
val result = byteArray.slice(offset, offset + size)
val paddedResult = if (size > availableSize) {
result ++ Array.fill(size - availableSize)(0.toByte) // Padding with zeros
} else {
result
}
BigInt(1, paddedResult) // Convert the byte array back to BigInt
}
case None =>
// If no memory is stored at the requested address, return zero
BigInt(0) // TODO: may need to be sm else
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any of this actually used for anything?

override def toString: String = s"Data($regionIdentifier, $start, $size, ($relfContent))"
def end: BigInt = start + size - 1
val relfContent: mutable.Set[String] = mutable.Set[String]()
val isPointerTo: Option[DataRegion] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

* 3. starts between regions (start, end) and (value + size) => end
* 4. starts between regions (start, end) and (value + size) < end (add both regions ie. next region)
*/
def findStackPartialAccessesOnly(value: BigInt, size: BigInt): Set[StackRegion] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used for anything?

matchingRegions.toSet.map(returnRegion)
}

def getRegionsWithSize(size: BigInt, function: String, negateCondition: Boolean = false): Set[MemoryRegion] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used for anything?

Comment on lines 225 to 230
// Assume heap region size is at least 1 TODO: must approximate size of heap
val negB = 1
val (name, start) = nextMallocCount(negB)
val newHeapRegion = HeapRegion(name, start, negB, IRWalk.procedure(n))
addReturnHeap(directCall, newHeapRegion)
s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to approximate the size of a heap allocation if you don't know it. The size of each heap allocation is not important except for detecting out-of-bounds accesses (which is not the most important). Each heap allocation needs to be its own separate address space, with the R0 returned from malloc being like the stack pointer for that heap.

})

Logger.debug("[!] Running MMM")
val mmm = MemoryModelMap()
mmm.convertMemoryRegions(mraResult, mergedSubroutines, globalOffsets, mraSolver.procedureToSharedRegions)
mmm.convertMemoryRegions(mraSolver.procedureToStackRegions, mraSolver.procedureToHeapRegions, mraSolver.mergeRegions, mraResult, mraSolver.procedureToSharedRegions, graSolver.getDataMap, graResult)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mraSolver.mergeRegions is always empty, isn't it?

Comment on lines +53 to 55
def canCoerceIntoDataRegion(bitVecLiteral: BitVecLiteral, size: Int): Option[DataRegion] = {
mmm.findDataObject(bitVecLiteral.value)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this have an unused size parameter?

tableAddress
}

def tryCoerceIntoData(exp: Expr, n: Command, subAccess: BigInt, loadOp: Boolean = false): Set[DataRegion] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand how this method works, can you explain it? I don't really follow why BinaryExprs are such a special case for it?

@@ -60,11 +60,6 @@ class InterprocSteensgaardAnalysis(

def getMemoryRegionContents: Map[MemoryRegion, Set[BitVecLiteral | MemoryRegion]] = memoryRegionContents.map((k, v) => k -> v.toSet).toMap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always just empty now isn't it?

Comment on lines 34 to 47
private def dataPoolMaster(offset: BigInt, size: BigInt): Option[DataRegion] = {
assert(size >= 0)
if (dataMap.contains(offset)) {
if (dataMap(offset).size < (size.toDouble / 8).ceil.toInt) {
dataMap(offset) = DataRegion(dataMap(offset).regionIdentifier, offset, (size.toDouble / 8).ceil.toInt)
Some(dataMap(offset))
} else {
Some(dataMap(offset))
}
} else {
dataMap(offset) = DataRegion(nextDataCount(), offset, (size.toDouble / 8).ceil.toInt)
Some(dataMap(offset))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to return an Option when it never returns None?

@l-kent
Copy link
Contributor

l-kent commented Oct 30, 2024

I'm going to merge this into main and file issues for remaining bugs now.

@l-kent
Copy link
Contributor

l-kent commented Oct 30, 2024

The --memory-regions flag can be used to enable the regions output

@l-kent l-kent merged commit 7849e72 into main Oct 30, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants