Adding Memory Regions to IR #231

yousifpatti · 2024-08-13T06:27:39Z

Before merging there are two main issues

Some initial regions of a function are ignored (they are not collected by MRA) and thus are not renamed. This may have been fixed in the MRA file of the other branch but has other dependencies that needs merging.
Some Data section regions that are in initial memory and read only memory are not found. (Those are not loaded by the ReadELFLoader and cause this issue as MMM relies on results from the loader. Notable mentions for "arrays" example:
[WARN] No region found for memory section 0 [[email protected]:297]
[WARN] No region found for memory section 0 [[email protected]:297]
[WARN] No region found for memory section 0 [[email protected]:297]
[WARN] No region found for memory section 0 [[email protected]:297]
[WARN] No region found for memory section 568 [[email protected]:297]
[WARN] No region found for memory section 596 [[email protected]:297]
[WARN] No region found for memory section 632 [[email protected]:297]
[WARN] No region found for memory section 664 [[email protected]:297]
[WARN] No region found for memory section 696 [[email protected]:297]
[WARN] No region found for memory section 960 [[email protected]:297]
[WARN] No region found for memory section 1148 [[email protected]:297]
[WARN] No region found for memory section 1176 [[email protected]:297]
[WARN] No region found for memory section 1240 [[email protected]:297]
[WARN] No region found for memory section 1528 [[email protected]:297]

Anything bigger than these offsets is resolved. These offsets dont seem to be loaded with externalFunctions or globalOffsets

l-kent · 2024-08-19T00:23:19Z

Memory sections in the initial memory/read only memory that are not actually accessed in the program don't need to be given their own region, it's fine if they just use the default mem region for now. Eventually we want to go through the initial memory definitions per-procedure so that any memory definitions that relate to regions that aren't in the procedure are not included.

l-kent · 2024-08-19T00:24:11Z

This may have been fixed in the MRA file of the other branch but has other dependencies that needs merging.

Which other branch are you talking about?

yousifpatti · 2024-08-19T01:32:31Z

This may have been fixed in the MRA file of the other branch but has other dependencies that needs merging.

Which other branch are you talking about?

I am talking about yousif-vsa branch where I made changes to MRA. This issue has been resolved though in my latest commit here so no need to worry about that anymore.

yousifpatti · 2024-08-19T03:29:33Z

@l-kent In that case can we look into merging the changes please?

l-kent · 2024-08-19T03:32:30Z

Yes, I'm going to review this next.

src/main/scala/util/RunUtils.scala

yousifpatti · 2024-08-20T01:41:24Z

The last commit should fix the issue with over approximating to "abort"
The issue was that abort was the last region loaded and thus any offsets after were approximated to "abort".
The other global offsets were missing "globalAddresses" variable not passed to MMM, so I have just added those offsets and it is working properly.

src/main/scala/analysis/RegionInjector.scala

l-kent · 2024-08-20T03:47:38Z

src/main/scala/analysis/RegionInjector.scala

+    for (elem <- memorySegment) {
+      elem match {
+        case mem: MemorySection =>
+          val regions = mmm.findDataObject(mem.address)


This doesn't work at present. MemorySection.address is just the address of the start of the memory section - for example .plt .got and .data are all MemorySections, and their contents are just listed as bytes, which is very crude. We want to extract the bytes relevant to any regions that the memory region analysis has determined are accessed, which requires much more work than just attempting to match the addresses like this.

Currently, all this method does is remove some data sections entirely, even if they contain addresses that we care about.

What is the relationship between those small 8 byte addresses and the memory section? Are they loaded from the relf file?

The MemorySections are loaded from the .adt or .gts file and each represent a single section of the binary - such as the .data section. The address of a MemorySection is the address of the start of the section, and each byte in the array of bytes is located at a subsequent address. So MemorySection.bytes[0] is the 8-bit value stored at address MemorySection.address, MemorySection.bytes[1] is the 8-bit value stored at address MemorySection.address + 1, and so on.

I will handle this part though, since this relates to fixing the Boogie output.

l-kent · 2024-08-20T05:33:31Z

Currently, the initial memory pre/post-conditions are not generated with the identified regions. An example of this is the test correct/basic_function_call_reader/clang_pic. The following snippets are from the output currently produced for this test:

procedure main();
  ...
  free requires (memory_load64_le(mem, 69568bv64) == 69688bv64);

R8, Gamma_R8 := 65536bv64, true;
call rely();
R8, Gamma_R8 := memory_load64_le(x, bvadd64(R8, 4032bv64)), (gamma_load64(Gamma_x, bvadd64(R8, 4032bv64)) || L(x, bvadd64(R8, 4032bv64)));

In the second snippet, the address 69568 (65536 + 4032) is loaded from the region x. However, in the precondition created from initialised memory in the first snippet, the default region mem is used for the address 69568. We want the regions to be aligned, so that the same region is used for both.

l-kent · 2024-08-20T05:39:37Z

The rely() procedures created from the user's specification do not include the new regions. An example of this is the test correct/basicassign_gamma0/clang. The following snippets are all from the output produced for it:

axiom ($secret_addr == 69684bv64);

procedure {:extern} rely();
  modifies Gamma_mem, mem;
  ensures (forall i: bv64 :: (((mem[i] == old(mem[i])) ==> (Gamma_mem[i] == old(Gamma_mem[i])))));
  ensures (memory_load32_le(mem, $secret_addr) == old(memory_load32_le(mem, $secret_addr)));

R8, Gamma_R8 := 69632bv64, true;
call rely();
R8, Gamma_R8 := zero_extend32_32(memory_load32_le(secret, bvadd64(R8, 52bv64))), (gamma_load32(Gamma_secret, bvadd64(R8, 52bv64)) || L(secret, bvadd64(R8, 52bv64)));

In the third snippet, the address 69684 (69632 + 52) is loaded from the region secret, but in the rely() procedure, the address 69684 is loaded from the default region mem. We want this to be replaced in the rely procedure so it matches the identified region.

l-kent · 2024-08-20T06:02:35Z

The test correct/initialisation/gcc demonstrates three issues - not using identified regions in the generated require/ensures clauses from the user specifications, incorrectly removing relevant sections of initialised memory, and not using the identified regions in initialised memory requires/ensures clauses. I've already discussed the latter so I'll detail the first two for this test.

On main, the following snippet is produced for this test:

procedure main();
  modifies ...
  free requires (memory_load64_le(mem, 69632bv64) == 0bv64);
  free requires (memory_load64_le(mem, 69640bv64) == 69640bv64);
  free requires (memory_load64_le(mem, 69648bv64) == 416611827717bv64);
  free requires (memory_load64_le(mem, 69656bv64) == 68719476735bv64);
  free requires (memory_load64_le(mem, 69664bv64) == 8589934593bv64);

These requires clauses are all removed from the output when using this branch's memory region analysis. However, the addresses 69648, 69656, 69664, 69668 and 69652 are all accessed, so we want to keep the initialisations for 69648, 69656 and 69664 around. We want the regions in the requires clause to match as well, so that would require splitting up the initialisations further - the region y would need the address 69652 initialised with a 32-bit access, and the region x would need the address 69648 initialised with a 32-bit access, instead of the 64-bit initialisation covering both as-is.

R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 16bv64), Gamma_R0;
call rely();
R0, Gamma_R0 := zero_extend32_32(memory_load32_le(x, R0)), (gamma_load32(Gamma_x, R0) || L(x, R0));

R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 24bv64), Gamma_R0;
call rely();
R0, Gamma_R0 := memory_load64_le(z, R0), (gamma_load64(Gamma_z, R0) || L(z, R0));

R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 32bv64), Gamma_R0;
call rely();
R0, Gamma_R0 := zero_extend32_32(memory_load32_le(a, R0)), (gamma_load32(Gamma_a, R0) || L(a, R0));

R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 32bv64), Gamma_R0;
call rely();
assert (L(a, bvadd64(R0, 4bv64)) ==> Gamma_R1);
a, Gamma_a := memory_store32_le(a, bvadd64(R0, 4bv64), R1[32:0]), gamma_store32(Gamma_a, bvadd64(R0, 4bv64), Gamma_R1);

R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 20bv64), Gamma_R0;
call rely();
R0, Gamma_R0 := zero_extend56_8(memory_load8_le(y, R0)), (gamma_load8(Gamma_y, R0) || L(y, R0));

The output also includes the following ensures clauses in the procedure main():

ensures (memory_load32_le(mem, $x_addr) == 6bv32);
ensures (memory_load32_le(mem, bvadd64($a_addr, 4bv64)) == 4bv32);
ensures (memory_load32_le(mem, bvadd64($a_addr, 0bv64)) == 1bv32);

The issue is that mem is not replaced with identified regions for these, but we want it to.

l-kent · 2024-08-20T06:17:07Z

Currently, the memory regions for pointers and pointers-to-pointers are unnecessarily combined. This doesn't cause any issues at present but it is unnecessary.

An example of this is the output for the test case correct/basic_arrays_read/clang_pic:

R8, Gamma_R8 := 65536bv64, true;
call rely();
R8, Gamma_R8 := memory_load64_le(arr, bvadd64(R8, 4056bv64)), (gamma_load64(Gamma_arr, bvadd64(R8, 4056bv64)) || L(arr, bvadd64(R8, 4056bv64)));
call rely();
assert (L(arr, R8) ==> true);
arr, Gamma_arr := memory_store32_le(arr, R8, 0bv32), gamma_store32(Gamma_arr, R8, true);

The important context here is that the global variable arr is located at address 69638, and this address is stored in memory at address 69592. This means that the first memory access loads 69638 (the address of global variable arr) from address 69592 (65536 + 4056) to R8, and the second memory access writes 0 to the global variable arr.

Both accesses are given the region arr even though they are completely distinct memory locations, and it would be ideal for the analysis to be able to give these separate regions.

l-kent · 2024-08-20T23:08:31Z

This PR causes some of the IndirectCallsTests to fail - for some of those tests, the indirect calls get incorrectly resolved to the global variable x or _ITM_registerTMCloneTable.

Indirect call resolution also fails for the correct/jumptable/clang (including variants) tests and for the correct/functionpointer/gcc_O2 and /clang_O2 tests.

The correct/jumptable/clang test is a particularly interesting example, and stack region identification is currently incorrect for it too. These are all the stack accesses produced in main() (in order):

#5, Gamma_#5 := bvadd64(R31, 64bv64), Gamma_R31;
// R31 + 64 (64-bit access)
stack_25, Gamma_stack_25 := memory_store64_le(stack_25, #5, R29), gamma_store64(Gamma_stack_25, #5, Gamma_R29); 

// R31 + 72 (64-bit access)
stack_25, Gamma_stack_25 := memory_store64_le(stack_25, bvadd64(#5, 8bv64), R30), gamma_store64(Gamma_stack_25, bvadd64(#5, 8bv64), Gamma_R30);

R29, Gamma_R29 := bvadd64(R31, 64bv64), Gamma_R31;

// R31 + 12 (32-bit access)
stack_15, Gamma_stack_15 := memory_store32_le(stack_15, bvadd64(R31, 12bv64), R8[32:0]), gamma_store32(Gamma_stack_15, bvadd64(R31, 12bv64), Gamma_R8);

// R31 + 60 (32-bit access)
stack_28, Gamma_stack_28 := memory_store32_le(stack_28, bvadd64(R29, 18446744073709551612bv64), 0bv32), gamma_store32(Gamma_stack_28, bvadd64(R29, 18446744073709551612bv64), true); 

 // R31 + 56 (32-bit access)
stack_28, Gamma_stack_28 := memory_store32_le(stack_28, bvadd64(R29, 18446744073709551608bv64), R0[32:0]), gamma_store32(Gamma_stack_28, bvadd64(R29, 18446744073709551608bv64), Gamma_R0);

// R31 + 48 (64-bit access)
stack_21, Gamma_stack_21 := memory_store64_le(stack_21, bvadd64(R29, 18446744073709551600bv64), R1), gamma_store64(Gamma_stack_21, bvadd64(R29, 18446744073709551600bv64), Gamma_R1); 

// R31 + 16 (128-bit access)
stack_21, Gamma_stack_21 := memory_store128_le(stack_21, bvadd64(R31, 16bv64), V0), gamma_store128(Gamma_stack_21, bvadd64(R31, 16bv64), Gamma_V0); 

// R31 + 32 (64-bit access)
stack_25, Gamma_stack_25 := memory_store64_le(stack_25, bvadd64(R31, 32bv64), R8), gamma_store64(Gamma_stack_25, bvadd64(R31, 32bv64), Gamma_R8); 

// R31 + 16 (64-bit access)
R8, Gamma_R8 := memory_load64_le(stack_21, bvadd64(R31, 16bv64)), gamma_load64(Gamma_stack_21, bvadd64(R31, 16bv64)); 

// R31 + 24 (64-bit access)
R8, Gamma_R8 := memory_load64_le(stack_28, bvadd64(R31, 24bv64)), gamma_load64(Gamma_stack_28, bvadd64(R31, 24bv64)); 

// R31 + 32 (64-bit access)
R8, Gamma_R8 := memory_load64_le(stack_25, bvadd64(R31, 32bv64)), gamma_load64(Gamma_stack_25, bvadd64(R31, 32bv64)); 

// R31 + 12 (32-bit access)
R0, Gamma_R0 := zero_extend32_32(memory_load32_le(stack_15, bvadd64(R31, 12bv64))), gamma_load32(Gamma_stack_15, bvadd64(R31, 12bv64)); 
#6, Gamma_#6 := bvadd64(R31, 64bv64), Gamma_R31;
// R31 + 64 (64-bit access)
R29, Gamma_R29 := memory_load64_le(stack_25, #6), gamma_load64(Gamma_stack_25, #6); 
// R31 + 72 (64-bit access)
R30, Gamma_R30 := memory_load64_le(stack_25, bvadd64(#6, 8bv64)), gamma_load64(Gamma_stack_25, bvadd64(#6, 8bv64));

Each stack region is associated with the following accesses:

stack_15: R31 + 12 (32-bit access)
stack_21: R31 + 48 (64-bit access), R31 + 16 (128-bit access), R31 + 16 (64-bit access)
stack_25: R31 + 64 (64-bit access), R31 + 72 (64-bit access), R31 + 32 (64-bit access)
stack_28: R31 + 60 (32-bit access), R31 + 56 (32-bit access), R31 + 24 (64-bit access)

It is incorrect that R31 + 16 (128-bit access) and R31 + 24 (64-bit access) are in different regions, despite being overlapping accesses - the analysis needs to consider the size of accesses as well as their address.

While the way the other regions are combined does not cause problems, it is imprecise and it would be ideal to fix it so that distinct accesses have distinct regions.

l-kent · 2024-08-20T23:58:12Z

Another issue is the regions in assertions, specifically the assertions that check if writes to memory are secure.

These are updates to the global variables z and x, respectively from the test correct/secret_write/clang, with annotations:

assert (L(z, bvadd64(R9, 52bv64)) ==> true); // checking that the memory store doesn't violate security properties
z_old := memory_load32_le(z, $z_addr); // getting current value of global variable z for comparison after the store
Gamma_x_old := (gamma_load32(Gamma_z, $x_addr) || L(z, $x_addr)); // getting current value of security level of global variable x, but using the wrong region
z, Gamma_z := memory_store32_le(z, bvadd64(R9, 52bv64), 0bv32), gamma_store32(Gamma_z, bvadd64(R9, 52bv64), true); // writing to global variable z
assert ((bvadd64(R9, 52bv64) == $z_addr) ==> (L(z, $x_addr) ==> Gamma_x_old)); // checking that if this memory store was to global variable z then it hasn't violated the security policy for global variable x, which depends on global variable z, but using the wrong region as input to the L function
assert bvsge32(memory_load32_le(mem, $z_addr), z_old); // checking the guarantee, that the global variable z is only increasing, but using the wrong region

assert (L(x, bvadd64(R8, 60bv64)) ==> Gamma_R10);
z_old := memory_load32_le(x, $z_addr); // storing current value of global variable z for comparison after the store, but using incorrect region 
Gamma_x_old := (gamma_load32(Gamma_x, $x_addr) || L(x, $x_addr));  // getting current value of security level of global variable x
x, Gamma_x := memory_store32_le(x, bvadd64(R8, 60bv64), R10[32:0]), gamma_store32(Gamma_x, bvadd64(R8, 60bv64), Gamma_R10); // writing to global variable x
assert ((bvadd64(R8, 60bv64) == $z_addr) ==> (L(x, $x_addr) ==> Gamma_x_old)); // checking that if this memory store was to global variable z then it hasn't violated the security policy for global variable x, which depends on global variable z. This assertion is now redundant since we have determined that this memory store was to the region x, which does not contain the global variable z which this assertion is relevant for. 
assert bvsge32(memory_load32_le(mem, $z_addr), z_old); // checking the guarantee, that the global variable z is only increasing, but using the wrong region. This assertion is now completely redundant since we have determined that this is a write to the region x which does not contain the global variable z which this assertion is relevant for.

There are number of issues we need to address here. Most straight-forwardly - there is still a reference to mem in the last assertion generated, which checks the user-specified guarantee. We need to make sure all accesses to global variables use the correct region for each global variable.

The more difficult issue is that we need to redesign this secure update check to take into account that not all global variables share the same region now, which means many checks can be removed.

We want to change the output so that it now looks something like this:

assert (L(z, bvadd64(R9, 52bv64)) ==> true);
z_old := memory_load32_le(z, $z_addr);
Gamma_x_old := (gamma_load32(Gamma_x, $x_addr) || L(x, $x_addr));
z, Gamma_z := memory_store32_le(z, bvadd64(R9, 52bv64), 0bv32), gamma_store32(Gamma_z, bvadd64(R9, 52bv64), true);
assert ((bvadd64(R9, 52bv64) == $z_addr) ==> (L(x, $x_addr) ==> Gamma_x_old));
assert bvsge32(memory_load32_le(z, $z_addr), z_old);

assert (L(x, bvadd64(R8, 60bv64)) ==> Gamma_R10);
x, Gamma_x := memory_store32_le(x, bvadd64(R8, 60bv64), R10[32:0]), gamma_store32(Gamma_x, bvadd64(R8, 60bv64), Gamma_R10); // writing to global variable x

The differences are:

the correct regions are used in relation to global variables
assertions for guarantees are only generated if a global variable referred to in a guarantee is contained in a region accessed by a memory store
secure update assertions are only generated for a control variable (in this case, z is the control variable) if a region containing that control variable is accessed by a memory store

It may also make sense to split the L function so that each region has its own L function? I'm not sure if this matters at all either way though.

l-kent · 2024-08-21T00:00:31Z

I think that should be pretty much everything. I realise this is a lot. I think the most important issues to focus on at first are the ones in this comment: #231 (comment) since those are issues with the analysis itself, rather than just more work that needs to be done with the analysis results.

l-kent · 2024-08-21T00:09:55Z

src/main/scala/analysis/MemoryModelMap.scala

+  private def returnRegion(region: StackRegion): StackRegion = {
+    uf.find(region.asInstanceOf[MemoryRegion]).asInstanceOf[StackRegion]
+  }
+
+  private def returnRegion(region: DataRegion): DataRegion = {
+    uf.find(region.asInstanceOf[MemoryRegion]).asInstanceOf[DataRegion]
+  }
+
+  private def returnRegion(region: HeapRegion): HeapRegion = {
+    uf.find(region.asInstanceOf[MemoryRegion]).asInstanceOf[HeapRegion]
+  }


If all the region subtypes are treated completely separately as this implies, shouldn't each region subtype have its own UnionFind to enforce this?

l-kent · 2024-08-21T01:51:27Z

Another issue: the stack identification doesn't seem to work correctly when heap pointers are involved.

This is from test/correct/malloc_with_local/gcc/malloc_with_local.bir, with annotations:

0000036d: call @malloc with return %0000036f // call malloc, putting fresh heap pointer in R0

0000036f:
00000375: mem := mem with [R31 + 0x20, el]:u64 <- R0 // store fresh heap pointer on stack at R31 + 0x20
0000037a: R0 := 4
0000037f: R30 := 0x830
00000381: call @malloc with return %00000383 // call malloc, putting a second fresh heap pointer in R0

00000383:
00000389: mem := mem with [R31 + 0x28, el]:u64 <- R0 // store second heap pointer on stack at R31 + 0x28
0000038e: R0 := 0xA
00000396: mem := mem with [R31 + 0x1C, el]:u32 <- 31:0[R0] 
0000039d: R0 := mem[R31 + 0x20, el]:u64 // load first heap pointer from stack to R0
000003a2: R1 := 0x41
000003aa: mem := mem with [R0] <- 7:0[R1] // write 0x41 to first heap location, which R0 points to
000003b1: R0 := mem[R31 + 0x28, el]:u64 // load second heap pointer from stack to R0
000003b6: R1 := 0x2A
000003be: mem := mem with [R0, el]:u32 <- 31:0[R1] // write 0x2A to second heap location, which R0 points to
000003c5: R0 := mem[R31 + 0x20, el]:u64 // load first heap pointer from stack to R0
000003cc: R0 := pad:64[mem[R0]] // load value contained in first heap location to R0

The corresponding Boogie produced is the following (with annotations and cleaned up a little):

  // call malloc, putting fresh heap pointer in R0
  call malloc();  
  goto l0000036f;
l0000036f:
  // store fresh heap pointer on stack at R31 + 32 (== 0x20)
  stack_15, Gamma_stack_15 := memory_store64_le(stack_15, bvadd64(R31, 32bv64), R0), gamma_store64(Gamma_stack_15, bvadd64(R31, 32bv64), Gamma_R0);
  R0, Gamma_R0 := 4bv64, true;
  R30, Gamma_R30 := 2096bv64, true;
  // call malloc, putting a second fresh heap pointer in R0
  call malloc();
  goto l00000383;
l00000383:
  // store second heap pointer on stack at R31 + 40 (== 0x28) 
  stack_18, Gamma_stack_18 := memory_store64_le(stack_18, bvadd64(R31, 40bv64), R0), gamma_store64(Gamma_stack_18, bvadd64(R31, 40bv64), Gamma_R0); 
  R0, Gamma_R0 := 10bv64, true;
  stack_20, Gamma_stack_20 := memory_store32_le(stack_20, bvadd64(R31, 28bv64), R0[32:0]), gamma_store32(Gamma_stack_20, bvadd64(R31, 28bv64), Gamma_R0);
  // load first heap pointer from stack to R0
  R0, Gamma_R0 := memory_load64_le(stack_15, bvadd64(R31, 32bv64)), gamma_load64(Gamma_stack_15, bvadd64(R31, 32bv64)); 
  R1, Gamma_R1 := 65bv64, true;
  // write 65 (== 0x41) to first heap location, which R0 points to - incorrectly uses region 'stack_15' instead of the first heap region
  stack_15, Gamma_stack_15 := memory_store8_le(stack_15, R0, R1[8:0]), gamma_store8(Gamma_stack_15, R0, Gamma_R1);
  // load second heap pointer from stack to R0
  R0, Gamma_R0 := memory_load64_le(stack_18, bvadd64(R31, 40bv64)), gamma_load64(Gamma_stack_18, bvadd64(R31, 40bv64));
  R1, Gamma_R1 := 42bv64, true;
  // write 42 (== 0x2A) to second heap location, which R0 points to  - incorrectly uses region 'stack_18' instead of the second heap region
  stack_18, Gamma_stack_18 := memory_store32_le(stack_18, R0, R1[32:0]), gamma_store32(Gamma_stack_18, R0, Gamma_R1);
  // load first heap pointer from stack to R0
  R0, Gamma_R0 := memory_load64_le(stack_15, bvadd64(R31, 32bv64)), gamma_load64(Gamma_stack_15, bvadd64(R31, 32bv64));
  // load value contained in first heap location to R0 - incorrectly uses region 'stack_15' instead of the first heap region
  R0, Gamma_R0 := zero_extend56_8(memory_load8_le(stack_15, R0)), gamma_load8(Gamma_stack_15, R0);

The big issue is that accesses to the heap are incorrectly treated as accesses to the stack regions in which the heap pointers are stored. Every heap allocation should have its own region and should not be incorrectly identified as the stack.

yousifpatti · 2024-10-29T02:04:46Z

I pushed up to the working things

# Conflicts: # src/main/scala/analysis/InterprocSteensgaardAnalysis.scala # src/main/scala/analysis/MemoryModelMap.scala # src/main/scala/ir/Expr.scala # src/main/scala/util/RunUtils.scala

l-kent · 2024-10-29T03:07:26Z

For correct/basic_lock_read/gcc:BAP, there is an inconsistency between the GRA result and the MMM. The GRA correctly gives a region with start address 69652 (0x11014) and size 4 to accesses to that address, but this region does not appear in the MMM's dataMap. I haven't yet identified why.

This was an issue before your most recent changes but I had previously missed it.

l-kent · 2024-10-29T03:41:07Z

For correct/initialisation/clang_pic:BAP, there are some region size issues in relation to array accesses as previously discussed, but there is another error in the analysis as well.

This is the relevant parts of the .bir file:

00000305: R9 := 0x10000
0000030c: R9 := mem[R9 + 0xFC8, el]:u64

0000036b: R8 := pad:64[mem[R9, el]:u32]

00000379: mem := mem with [R9 + 4, el]:u32 <- 31:0[R8]

The access at 0000036b is incorrectly given a region with a size of 8 instead of 4 (it only has a 32-bit access) - this is the same region size issue with array accesses.

The access at 00000379 is incorrectly given a region with size 8 and start address 0x10FCC (69580). It seems to have gotten this by doing 0x10000 + 0xFC8 + 4, but the fact that 0x10FC8 (69576) is loaded from memory and the result has 4 added to it has been missed. The correct address is 0x11044 (69700) as mem[0x10FC8] == 0x11040 (69696).

This was probably an issue before your most recent changes, just overshadowed by other issues you've now fixed.

l-kent · 2024-10-29T03:51:30Z

For correct/secret_write/clang:BAP, there is a new error introduced by the most recent changes. The GRA correctly gives accesses to address 0x11034 (69684, which is the address of z) a region, but this region is not included in the MMM's dataMap.

l-kent · 2024-10-29T04:14:15Z

For correct/functionpointer/clang:GTIRB, there is an issue with the stack regions. This is an excerpt of the output:

    R29, Gamma_R29 := bvadd64(R31, 32bv64), Gamma_R31;
    stack_9, Gamma_stack_9 := memory_store32_le(stack_9, bvadd64(R29, 18446744073709551612bv64), 0bv32), gamma_store32(Gamma_stack_9, bvadd64(R29, 18446744073709551612bv64), true);
    assume {:captureState "1872$0"} true;
    stack_20, Gamma_stack_20 := memory_store32_le(stack_20, bvadd64(R29, 18446744073709551608bv64), R0[32:0]), gamma_store32(Gamma_stack_20, bvadd64(R29, 18446744073709551608bv64), Gamma_R0);
    assume {:captureState "1876$0"} true;
    R8, Gamma_R8 := zero_extend32_32(memory_load32_le(stack_8, bvadd64(R29, 18446744073709551608bv64))), gamma_load32(Gamma_stack_8, bvadd64(R29, 18446744073709551608bv64));

The accesses to regions stack_20 and stack_8 both have the same address (R29 - 8 == R31 + 24) but have different regions. This is incorrect.

This was an issue before your most recent changes but I had missed it.

edit: This seems to only happen some of the time, so it may be some weird iteration-order issue with iterating over a Set or something?

l-kent · 2024-10-29T04:44:50Z

src/main/scala/util/RunUtils.scala

@@ -537,15 +558,18 @@ object RunUtils {
    var iteration = 1
    var modified: Boolean = true
    val analysisResult = mutable.ArrayBuffer[StaticAnalysisContext]()
-    while (modified) {
+    while (modified || analysisResult.size < 2) {


This isn't a good way to make use of the VSA results, as we don't need to run every analysis twice. The way the analyses feed into each other could be significantly improved - having a circular dependency between them really doesn't seem ideal.

l-kent · 2024-10-29T05:11:44Z

src/main/scala/analysis/MemoryModelMap.scala

+  private var DataMemory, HeapMemory, StackMemory = TreeMap[BigInt, Array[Byte]]()
+
+
+
+  // Store operation: store BigInt value at a BigInt address
+  def store(address: BigInt, value: BigInt, memoryType: MemoryType): Unit = {
+    val byteArray = value.toByteArray
+    memoryType match
+      case MemoryType.Data => DataMemory += (address -> byteArray)
+      case MemoryType.Heap => HeapMemory += (address -> byteArray)
+      case MemoryType.Stack => StackMemory += (address -> byteArray)
+  }
+
+  // Load operation: load from a BigInt address with a specific size
+  def load(address: BigInt, size: Int, memoryType: MemoryType): BigInt = {
+    val memory = memoryType match
+      case MemoryType.Data => DataMemory
+      case MemoryType.Heap => HeapMemory
+      case MemoryType.Stack => StackMemory
+    // Find the memory block that contains the starting address
+    val floorEntry = memory.rangeTo(address).lastOption
+
+    floorEntry match {
+      case Some((startAddress, byteArray)) =>
+        val offset = (address - startAddress).toInt // Offset within the byte array
+        // If the load exceeds the stored data, we need to handle padding with zeros
+        if (offset >= byteArray.length) {
+          BigInt(0)
+        } else {
+          // Calculate how much data we can retrieve
+          val availableSize = byteArray.length - offset
+          // Slice the available data, and if requested size exceeds, append zeros
+          val result = byteArray.slice(offset, offset + size)
+          val paddedResult = if (size > availableSize) {
+            result ++ Array.fill(size - availableSize)(0.toByte) // Padding with zeros
+          } else {
+            result
+          }
+          BigInt(1, paddedResult) // Convert the byte array back to BigInt
+        }
+      case None =>
+        // If no memory is stored at the requested address, return zero
+        BigInt(0) // TODO: may need to be sm else
+    }
+  }


Is any of this actually used for anything?

l-kent · 2024-10-29T05:19:23Z

src/main/scala/analysis/MemoryModelMap.scala

+  override def toString: String = s"Data($regionIdentifier, $start, $size, ($relfContent))"
+  def end: BigInt = start + size - 1
+  val relfContent: mutable.Set[String] = mutable.Set[String]()
+  val isPointerTo: Option[DataRegion] = None


What is this for?

l-kent · 2024-10-29T05:20:24Z

src/main/scala/analysis/MemoryModelMap.scala

+   * 3. starts between regions (start, end) and (value + size) => end
+   * 4. starts between regions (start, end) and (value + size) < end (add both regions ie. next region)
+   */
+  def findStackPartialAccessesOnly(value: BigInt, size: BigInt): Set[StackRegion] = {


Is this used for anything?

l-kent · 2024-10-29T05:20:31Z

src/main/scala/analysis/MemoryModelMap.scala

+    matchingRegions.toSet.map(returnRegion)
+  }
+
+  def getRegionsWithSize(size: BigInt, function: String, negateCondition: Boolean = false): Set[MemoryRegion] = {


Is this used for anything?

l-kent · 2024-10-29T05:25:31Z

src/main/scala/analysis/MemoryRegionAnalysis.scala

+                // Assume heap region size is at least 1 TODO: must approximate size of heap
+                val negB = 1
+                val (name, start) = nextMallocCount(negB)
+                val newHeapRegion = HeapRegion(name, start, negB, IRWalk.procedure(n))
+                addReturnHeap(directCall, newHeapRegion)
+                s


You don't need to approximate the size of a heap allocation if you don't know it. The size of each heap allocation is not important except for detecting out-of-bounds accesses (which is not the most important). Each heap allocation needs to be its own separate address space, with the R0 returned from malloc being like the stack pointer for that heap.

l-kent · 2024-10-29T05:58:17Z

src/main/scala/util/RunUtils.scala

    })

    Logger.debug("[!] Running MMM")
-    val mmm = MemoryModelMap()
-    mmm.convertMemoryRegions(mraResult, mergedSubroutines, globalOffsets, mraSolver.procedureToSharedRegions)
+    mmm.convertMemoryRegions(mraSolver.procedureToStackRegions, mraSolver.procedureToHeapRegions, mraSolver.mergeRegions, mraResult, mraSolver.procedureToSharedRegions, graSolver.getDataMap, graResult)


mraSolver.mergeRegions is always empty, isn't it?

l-kent · 2024-10-29T23:23:30Z

src/main/scala/analysis/VSA.scala

+  def canCoerceIntoDataRegion(bitVecLiteral: BitVecLiteral, size: Int): Option[DataRegion] = {
+    mmm.findDataObject(bitVecLiteral.value)
  }


Why does this have an unused size parameter?

l-kent · 2024-10-29T23:28:25Z

src/main/scala/analysis/GlobalRegionAnalysis.scala

+    tableAddress
+  }
+
+  def tryCoerceIntoData(exp: Expr, n: Command, subAccess: BigInt, loadOp: Boolean = false): Set[DataRegion] = {


I don't really understand how this method works, can you explain it? I don't really follow why BinaryExprs are such a special case for it?

l-kent · 2024-10-29T23:30:53Z

src/main/scala/analysis/InterprocSteensgaardAnalysis.scala

@@ -60,11 +60,6 @@ class InterprocSteensgaardAnalysis(

  def getMemoryRegionContents: Map[MemoryRegion, Set[BitVecLiteral | MemoryRegion]] = memoryRegionContents.map((k, v) => k -> v.toSet).toMap


This is always just empty now isn't it?

l-kent · 2024-10-29T23:34:43Z

src/main/scala/analysis/GlobalRegionAnalysis.scala

+  private def dataPoolMaster(offset: BigInt, size: BigInt): Option[DataRegion] = {
+    assert(size >= 0)
+    if (dataMap.contains(offset)) {
+      if (dataMap(offset).size < (size.toDouble / 8).ceil.toInt) {
+        dataMap(offset) = DataRegion(dataMap(offset).regionIdentifier, offset, (size.toDouble / 8).ceil.toInt)
+        Some(dataMap(offset))
+      } else {
+        Some(dataMap(offset))
+      }
+    } else {
+      dataMap(offset) = DataRegion(nextDataCount(), offset, (size.toDouble / 8).ceil.toInt)
+      Some(dataMap(offset))
+    }
+  }


Why does this need to return an Option when it never returns None?

Memory Regions in Boogie Output

l-kent · 2024-10-30T04:39:47Z

I'm going to merge this into main and file issues for remaining bugs now.

l-kent · 2024-10-30T04:40:05Z

The --memory-regions flag can be used to enable the regions output

yousifpatti added 8 commits May 1, 2024 14:36

Injecting memory regions into IR

5bd765a

Loop handling in MRA (using depth for now)

e878a76

Initial Works on VSA

d23172d

Changes to partial region accesses

c0be95b

Adding regions into IR

55f9bd8

Merging only region injection files

834f34e

Fixed Small Bugs

f607fdf

Dont ignore regions

5a20e37

yousifpatti self-assigned this Aug 13, 2024

Changes to consider local variables and initial offsets of a procedure

f798374

Adding condition to disable stackIdentification

65e6ff7

l-kent reviewed Aug 20, 2024

View reviewed changes

src/main/scala/util/RunUtils.scala Outdated Show resolved Hide resolved

Fixed not loaded data regions

20f9cf6

l-kent reviewed Aug 20, 2024

View reviewed changes

src/main/scala/analysis/RegionInjector.scala Outdated Show resolved Hide resolved

l-kent reviewed Aug 20, 2024

View reviewed changes

l-kent reviewed Aug 21, 2024

View reviewed changes

Merge branch 'yousif-fixMemoryInjection' into memory-region-boogie

8cb5e2a

# Conflicts: # src/main/scala/analysis/InterprocSteensgaardAnalysis.scala # src/main/scala/analysis/MemoryModelMap.scala # src/main/scala/ir/Expr.scala # src/main/scala/util/RunUtils.scala

l-kent reviewed Oct 29, 2024

View reviewed changes

l-kent and others added 5 commits October 30, 2024 09:50

general clean up

937cfc2

remove mutability in DataRegion case class

dabf9f9

remove mutability in StackRegion case class

af29cc1

add --memory-regions flag to control whether regions are added to Boogie

250d9c7

Merge pull request #262 from UQ-PAC/memory-region-boogie

162da67

Memory Regions in Boogie Output

l-kent added 5 commits October 30, 2024 14:45

don't do multiple iterations unless needed for regions

d1c0870

fix issue with L function when analysis is off

c6e5d8a

fix issue with initialised memory and stack when analysis is enabled

dc1668e

update expected

19dcdb0

Merge branch 'memory-region-boogie' into yousif-fixMemoryInjection

ba13989

l-kent merged commit 7849e72 into main Oct 30, 2024
2 checks passed

		@@ -60,11 +60,6 @@ class InterprocSteensgaardAnalysis(

		def getMemoryRegionContents: Map[MemoryRegion, Set[BitVecLiteral \| MemoryRegion]] = memoryRegionContents.map((k, v) => k -> v.toSet).toMap

Adding Memory Regions to IR #231

Adding Memory Regions to IR #231

Conversation

yousifpatti commented Aug 13, 2024 • edited Loading

l-kent commented Aug 19, 2024

l-kent commented Aug 19, 2024

yousifpatti commented Aug 19, 2024 • edited Loading

yousifpatti commented Aug 19, 2024

l-kent commented Aug 19, 2024

yousifpatti commented Aug 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l-kent commented Aug 20, 2024 • edited Loading

l-kent commented Aug 20, 2024

l-kent commented Aug 20, 2024

l-kent commented Aug 20, 2024

l-kent commented Aug 20, 2024 • edited Loading

l-kent commented Aug 20, 2024

l-kent commented Aug 21, 2024

Choose a reason for hiding this comment

l-kent commented Aug 21, 2024 • edited Loading

yousifpatti commented Oct 29, 2024

l-kent commented Oct 29, 2024

l-kent commented Oct 29, 2024 • edited Loading

l-kent commented Oct 29, 2024 • edited Loading

l-kent commented Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l-kent commented Oct 30, 2024

l-kent commented Oct 30, 2024

yousifpatti commented Aug 13, 2024 •

edited

Loading

yousifpatti commented Aug 19, 2024 •

edited

Loading

yousifpatti commented Aug 20, 2024 •

edited

Loading

l-kent commented Aug 20, 2024 •

edited

Loading

l-kent commented Aug 20, 2024 •

edited

Loading

l-kent commented Aug 21, 2024 •

edited

Loading

l-kent commented Oct 29, 2024 •

edited

Loading

l-kent commented Oct 29, 2024 •

edited

Loading

l-kent commented Oct 29, 2024 •

edited

Loading