Skip to content

Program Structure

Thomas-Malcolm edited this page Jun 30, 2023 · 12 revisions

ADT to BASIL IR

Analysis is performed over the tool's own internal IR, BASIL IR. To do so, we need a way to take a program (assumed: binary), and transform it into this IR. To do this we use a tool to lift the Binary into an ADT (Abstract Data Type) - BAP lifts the ARM binary into its own IR, which it can then output as an ADT.

Parsing Phase

ANTLR is a tool which takes a grammar file (BilAdt.g4) and creates a parser in Java that exposes tokens of the grammar through classes. The parser is autogenerated at runtime here.

val adtLexer = BilAdtLexer(CharStreams.fromFileName(fileName))
val tokens = CommonTokenStream(adtLexer)
val parser = BilAdtParser(tokens)

The parser object then stores the parsed ADT (Abstract Data Tree) of the program. We then walk the ADT in AdtStatementLoader

Walking Phase

Translating Phase

Derivation of the ADT grammar

The grammar of the parser is inspired by the existing parser found at BinaryAnalysisPlatform/bap-python, in particular, the nodes found in adt.py, br.py and bil.py. The grammar attempts to reflect the inheritance hierarchy - the alternatives of a token are its subclasses.

For example, exp and term are subclasses of ADT, and thus in the grammar we have adt : exp | term | ....

General Notes

  • Negative numbers are encoded somewhat strangely. It takes the 2's complement representation of the number, and interprets the binary string as if it were an unsigned integer.
    • For example, the number -2 in 64 bits is 1111111111111111111111111111111111111111111111111111111111111110, which is interpreted in the ADT as Int(18446744073709551614, 64).

BASIL IR

The internal IR is designed such that it will translate easily to Boogie's IVL.

Analyses

Analysis and Solvers

Supported Analyses

Clone this wiki locally