Function summaries #227

b-paul · 2024-07-17T02:10:54Z

Generates ensures clauses about the gammas of output variables in procedures using a taint analysis (ish). As an example

int c; // L(c) = true

int f(int a, int b) {
    if (c) {
        return a;
    } else {
        return b;
    }
}

we know that the return value of this function could be affected by a or b, so we say ensures Gamma_out == Gamma_a || Gamma_out == Gama_b || Gamma_out == true.

This analysis does not currently scale to run on the cntlm-new binary. This is probably because of the time complexity of the IDE solver.

…neration

ailrst · 2024-07-17T02:21:23Z

To clarify the approximation in the taint analysis, this runs the taint analysis separately for each each input variable being tainted, and treats taint as a fixed gamma value, so that taint-flow is security-level information-flow. It over-approximates towards high (where?), so for each variable the ensures condition ensures either low, equal to the input variables which taint it = possibly high.

b-paul · 2024-07-17T04:05:57Z

Yeah, the initial implementation would perform a taint analysis for each input/global variable, where taint is the variable's gamma at the start of the procedure. Doing this over every variable tells us precisely which input gammas affect the gamma of each output variable. The current implementation does this more efficiently though, instead working over a powerset lattice to determine the set of input variables which taint an output variable, in a single analysis.

I'm not 100% sure what you mean by over-approximates to high, are you referring to how the results can miss cases where a gamma is definitely high depending on the input (For example int f(int x, int y) { return x + y; } and int f(int x, int y) { return c ? x : y; } generate the same ensures clause, however the first one could be saying ensures Gamma_out == Gamma_x && Gamma_y)?

…work

b-paul · 2024-07-18T06:51:59Z

This change improves the performance of variable dependency analysis by breaking the call graph into its strongly connected components using Tarjan's algorithm, and analyzing those individually. We get a reverse topological sort of the components from Tarjan's algorithm, which allows us to analyze our procedures in an order such that we can get "summaries" of the analysis for all procedures called by a particular procedure, which significantly reduces duplicated work. (Unfortunately the cntlm-new binary has a really big strongly connected component in it which makes this analysis still infeasible to run on the whole binary)

b-paul · 2024-07-19T06:24:16Z

(Credit to @ j-tobler) Ensures clauses might be able to be rewritten as $(\bigwedge_{v \in \text{taint}(out)} \Gamma_v) \Rightarrow \Gamma_{out}$, the gamma of the output variable is lower than the join of the gammas of the tainters.

l-kent · 2024-07-22T03:46:51Z

src/main/scala/analysis/solvers/FixPointSolver.scala

+    if (y != xn) {
+      for succ <- outdep(n) do propagate(y, succ)
+    }


@yousifpatti didn't you find that doing this caused some bugs?

This caused the results of constantprop analysis to not be propagated properly. The effects may not be noticeable because evaluate expression may return TOP and you deal with it the same way. The best way to check is by checking the dot file of constant prop to see if values are being propagated.

Thanks, I remember that made the analyses resolve some indirect calls incorrectly. It would be good if we could figure out how to make this optimisation work but it should be reverted for now.

l-kent · 2024-07-22T03:49:13Z

src/main/scala/util/RunUtils.scala

+    /*
+    Logger.info("[!] Running Taint Analysis")
+    val taintResults = TaintAnalysis(IRProgram, specGlobalAddresses, constPropResult,
+        IRProgram.procedures.foldLeft(Map[CFGPosition, Set[analysis.Taintable]]()) {
+          (m, p) => m + (p -> Set(analysis.UnknownMemory()))
+        }
+      ).analyze()
+
+    config.analysisResultsPath.foreach(s =>
+      writeToFile(printAnalysisResults(IRProgram, taintResults), s"${s}_taint$iteration.txt")
+    )
+    */


Does it make sense to add a flag for this if we don't want it enabled by default, or is it not yet working?

There wasn't any particular reason for commenting this out apart from just not wanting it to run so adding a flag should be fine, would you like that? These taint results were kinda just there for testing and might not mean much

If the taint analysis gets run as part of generating the function summaries I don't think it makes sense to have it as a separate thing, so this bit could probably just be removed actually?

Actually yeah these results won't be used for anything you're right, this can be removed

l-kent · 2024-07-22T04:53:33Z

src/main/scala/analysis/SummaryGenerator.scala

+  // TODO should the stack/link/frame pointers propagate taint?
+  val variables: Set[analysis.Taintable] = 0.to(28).map { n =>
+    Register(s"R$n", 64)
+  }.toSet ++ specGlobals.map { g =>
+    analysis.GlobalVariable(dsl.mem, BitVecLiteral(g.address, 64), g.size, g.name)
+  }


Only registers R0-R18 and R30 are allowed under the AArch64 standard to change values as the result of a procedure call. This means that there isn't a need to create summaries for the gamma values for R19-R29 and R31 - we already have those.

Oh that's helpful, thanks! I was unaware

src/main/scala/analysis/EdgeFunctionLattice.scala

l-kent · 2024-07-22T05:08:39Z

src/test/scala/SystemTests.scala

-  runTests(incorrectPrograms, incorrectPath, "incorrect", false, false)
+  runTests(correctPrograms, correctPath, "correct", true, false, false)
+  runTests(incorrectPrograms, incorrectPath, "incorrect", false, false, false)
+  runTests(correctWithAnalysisPrograms, correctWithAnalysisPath, "correct", true, false, true)


This (and the BAP one) should use "correct-analysis" since that's the name of the folder the tests are in.

l-kent · 2024-07-22T05:22:09Z

There seems to be room for optimisation here in relation to which variables this generates summaries for. In some of the tests, it will generate summaries for global variables even when the procedure does not modify memory at all, or only modifies memory because it calls rely (which just says that it does not modify memory).

l-kent · 2024-07-22T05:38:05Z

I'm a bit confused about why it even generates summaries for the globals x and y in the get_two procedure in the function_summary test. get_two doesn't at all access those globals, or even memory at all, so why does the taint analysis say they're tainted?

l-kent · 2024-07-22T05:42:57Z

src/main/scala/util/RunUtils.scala

@@ -931,6 +981,8 @@ object RunUtils {
        result.reachingDefs,
        ctx.program
      )
+      Logger.info("[!] Generating Function Summaries")
+      modified = modified | IRTransform.generateFunctionSummaries(ctx, ctx.program, result.constPropResult, result.varDepsSummaries)


I don't think it makes sense to update modified here, at least at the moment. The procedure summaries don't have any impact (and can't have any impact) on any of the other analyses right now so there isn't any need to do another iteration.

l-kent · 2024-07-22T06:00:50Z

src/main/scala/analysis/SummaryGenerator.scala

+    val tainters = relevantVars.map {
+      v => (v, Set())
+    }.toMap ++ getTainters(procedure, variables ++ rnaResults(procedure.begin) + UnknownMemory()).filter { (variable, taints) =>
+      relevantVars.contains(variable)
+    }


A comment explaining what this represents and how the sets of tainters for each variable have been derived would help.

src/main/scala/analysis/SummaryGenerator.scala

l-kent · 2024-07-22T06:23:15Z

The summaries for function_summary3 seem incorrect. For f we get:

ensures gamma_load32(Gamma_mem, $x_addr);
ensures Gamma_R1;
ensures Gamma_R0;

and for g we get:

ensures (gamma_load32(Gamma_mem, $x_addr) || (gamma_load32(Gamma_mem, $x_addr) == old(gamma_load32(Gamma_mem, $x_addr))));
ensures ((Gamma_R0 || (Gamma_R0 == old(Gamma_R1))) || (Gamma_R0 == old(Gamma_R0)));

Neither of these procedures modify memory at all, so I'm unclear on why summaries for the global variable x are generated.

For f, how is it that R1 and R0 are ensured to be low? Shouldn't R1 and R0 both have security levels that are dependent on the input security level of R0?

These issues may be related to this example using a non-returning call which is not really properly supported at present.

b-paul · 2024-07-22T09:30:24Z

I'm a bit confused about why it even generates summaries for the globals x and y in the get_two procedure in the function_summary test. get_two doesn't at all access those globals, or even memory at all, so why does the taint analysis say they're tainted?

You're right in that get_two shouldn't need to say anything about x and y. The reason it mentions those is because x and y are initially marked as tainting themselves, so at the end of the procedure they still taint themselves. It shouldn't be too hard to add a check for modifying memory (hopefully) though

For f, how is it that R1 and R0 are ensured to be low? Shouldn't R1 and R0 both have security levels that are dependent on the input security level of R0?

That is concerning, it certainly shouldn't be doing that 😬 I'm confused by how that even passed the test to be perfectly honest... My first guess was that it is something to do with the summaries generated by the strongly connected components, or something like that

I'll hopefully find time to work on this soon

l-kent · 2024-07-22T23:19:22Z

The test passes because the 'assume false' (generated by the non-returning call to g() in f()) allows for any post condition of f() to be verified. That's part of what I mean by non-returning calls not being properly supported at the moment.

The reason it mentions those is because x and y are initially marked as tainting themselves, so at the end of the procedure they still taint themselves. It shouldn't be too hard to add a check for modifying memory (hopefully) though

Yes, it seems important to handle the case where a variable is not actually modified at all. It is probably difficult to do this robustly for now but at the very least it should be trivial to check whether memory is modified at all, and the memory region analysis should allow this to be improved in the future.

src/main/scala/analysis/solvers/FixPointSolver.scala

l-kent · 2024-08-12T01:24:50Z

I'm just going to clean this up and merge it in, putting it behind a flag for now.

# Conflicts: # src/test/scala/SystemTests.scala

b-paul added 12 commits July 12, 2024 14:29

Initial taint analysis on variables

3c7c691

Taint global and stack variables

df62ad6

Generate function summaries using taint analysis results

662a88a

Modify RNA analysis to account for memory variables

c8092a0

Implement variable dependency analysis for faster function summary ge…

52292d6

…neration

Cleanup

60f39f4

Function summary tests

942d8a7

Remove stack variables for soundness

8bffd38

Move integration tests with analysis into a seperate test run

9cabd53

Some unit tests

f148780

Modify procedure summaries in an IR transform

16618e1

Modify tests to fail without analysis

b105491

b-paul added 2 commits July 18, 2024 16:32

Improve performance by using known summaries to skip interprocedural …

9bbd32d

…work

Make strongly connected components return the components

5cdb61a

l-kent reviewed Jul 22, 2024

View reviewed changes

src/main/scala/analysis/EdgeFunctionLattice.scala Show resolved Hide resolved

l-kent reviewed Jul 22, 2024

View reviewed changes

src/main/scala/analysis/SummaryGenerator.scala Show resolved Hide resolved

l-kent reviewed Jul 23, 2024

View reviewed changes

src/main/scala/analysis/solvers/FixPointSolver.scala Show resolved Hide resolved

l-kent added 4 commits August 19, 2024 10:18

clean up and put procedure summaries behind flag

da8f381

reorganise procedure summary tests

b03838e

Merge branch 'main' into function-summaries

de5a580

# Conflicts: # src/test/scala/SystemTests.scala

fix merge

d32f485

l-kent merged commit ffd54a9 into main Aug 19, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function summaries #227

Function summaries #227

b-paul commented Jul 17, 2024 •

edited

Loading

ailrst commented Jul 17, 2024

b-paul commented Jul 17, 2024

b-paul commented Jul 18, 2024

b-paul commented Jul 19, 2024

l-kent Jul 22, 2024

yousifpatti Jul 23, 2024

l-kent Jul 23, 2024

l-kent Jul 22, 2024

b-paul Jul 22, 2024 •

edited

Loading

l-kent Jul 22, 2024

b-paul Jul 22, 2024

l-kent Jul 22, 2024

b-paul Jul 22, 2024

l-kent Jul 22, 2024

l-kent commented Jul 22, 2024

l-kent commented Jul 22, 2024

l-kent Jul 22, 2024

l-kent Jul 22, 2024

l-kent commented Jul 22, 2024

b-paul commented Jul 22, 2024

l-kent commented Jul 22, 2024 •

edited

Loading

l-kent commented Aug 12, 2024

Function summaries #227

Function summaries #227

Conversation

b-paul commented Jul 17, 2024 • edited Loading

ailrst commented Jul 17, 2024

b-paul commented Jul 17, 2024

b-paul commented Jul 18, 2024

b-paul commented Jul 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b-paul Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l-kent commented Jul 22, 2024

l-kent commented Jul 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l-kent commented Jul 22, 2024

b-paul commented Jul 22, 2024

l-kent commented Jul 22, 2024 • edited Loading

l-kent commented Aug 12, 2024

b-paul commented Jul 17, 2024 •

edited

Loading

b-paul Jul 22, 2024 •

edited

Loading

l-kent commented Jul 22, 2024 •

edited

Loading