-
Notifications
You must be signed in to change notification settings - Fork 0
Program Structure
Analysis is performed over the tool's own internal IR, BASIL IR. To do so, we need a way to take a program (assumed: binary), and transform it into this IR. To do this we use a tool to lift the Binary into an ADT (Abstract Data Type) - BAP lifts the ARM binary into its own IR, which it can then output as an ADT.
ANTLR is a tool which takes a grammar file (BilAdt.g4
) and creates a parser in Java that exposes tokens of the grammar through classes. The parser is autogenerated at runtime here.
val adtLexer = BilAdtLexer(CharStreams.fromFileName(fileName))
val tokens = CommonTokenStream(adtLexer)
val parser = BilAdtParser(tokens)
The parser
object then stores the parsed ADT (Abstract Data Tree) of the program. We then walk the ADT in AdtStatementLoader
The grammar of the parser is inspired by the existing parser found at BinaryAnalysisPlatform/bap-python, in particular, the nodes found in adt.py, br.py and bil.py. The grammar attempts to reflect the inheritance hierarchy - the alternatives of a token are its subclasses.
For example, exp
and term
are subclasses of ADT
, and thus in the grammar we have adt : exp | term | ...
.
- Negative numbers are encoded somewhat strangely. It takes the 2's complement representation of the number, and interprets the binary string as if it were an unsigned integer.
- For example, the number -2 in 64 bits is 1111111111111111111111111111111111111111111111111111111111111110, which is interpreted in the ADT as Int(18446744073709551614, 64).
The internal IR is designed such that it will translate easily to Boogie's IVL.