https://github.com/vusec/vuzzer64
VUzzer1 is a mutation-based fuzzer proposed at NDSS 2017 that attempt to discover unknown execution paths by modifying one or more seeds selected from the queue.
The main feature of VUzzer is that it tries to estimate data structures from the PUT control flow and the data flow to find the execution paths efficiently without any knowledge of the source code or input formats. It has been a pioneer in application-aware fuzzer without prior knowledge. However, the original VUzzer implementation had problems running in modern environments, so we reimplemented it in fuzzuf.
VUzzer PUT requires static analysis with IDAPython and instrumentation with PolyTracker before fuzzing. Install fuzzuf, Intel Pin, and PolyTracker as described in build_en.md, and follow the instruction below.
VUzzer uses the passed basic block addresses for coverage, so we must disable ASLR before running.
sudo sysctl -w kernel.randomize_va_space=0
PolyTracker writes propagated taint information to the database on taint analysis. We highly recommend using tmpfs for the database location because it is an I/O-intensive workload.
sudo mkdir -p /mnt/polytracker
sudo mount -t tmpfs -o size=100m tmpfs /mnt/polytracker
cd build
mkdir vuzzer_test
cp test/put_binaries/calc/calc.c ./vuzzer_test
cd vuzzer_test
polybuild --instrument-target -g -o instrumented.bin calc.c
The instrumented.bin
is a calc
binary with taint analysis operations instrumented by polybuild
.
gcc -o calc calc.c
/path/to/idat64 -A -S../../tools/bbweight/bb-weight-ida.py calc
The last command will generate three files in total: two dictionary files unique.dict
and full.dict
, and the weighted PUT control flow graph (CFG) file weight
.
After the above steps, you can find the following files in vuzzer_test
directory if the build succeeded.
calc
calc.c
full.dict
instrumented.bc
instrumented.bin
instrumented_instrumented.bc
instrumented_instrumented.o
unique.dict
weight
Run the following command in vuzzer_test
directory:
../fuzzuf vuzzer --in_dir=../test/put_binaries/calc/seeds -- ./calc @@
Place three or more initial seeds in the directory specified with --indir
.
The available options:
-
Global options (available for all fuzzers on
fuzzuf
)--out_dir=path/to/output/directory
- Specifies a path to the directory, where all the outputs from fuzzers go, such as crash seeds. The default path
/tmp/fuzzuf-out_dir
is used if not specified.
- Specifies a path to the directory, where all the outputs from fuzzers go, such as crash seeds. The default path
--exec_timelimit_ms=1234
- Specifies a time limit per PUT execution in milliseconds. The default time limit is 1 second (i.e. 1000 ms).
--exec_memlimit=1234
- Specifies the amount of memory available per PUT execution in megabytes. The default memory limit is unlimited.
--log_file=path/to/log/file
- Specifies a path to the file, where log outputs (and debug-log outputs if built with debug mode) go. Logs are printed to stdout if a path is not specified.
-
Local options (available for VUzzer only)
--full_dict=path/to/full.dict
- Specifies the path to the full dictionary file generated by
bb-weight.py
. This dictionary holds all magic numbers exist in the PUT binary. Default to./full.dict
if not specified.
- Specifies the path to the full dictionary file generated by
--unique_dict=path/to/unique.dict
- Specifies the path to the unique dictionary file generated by
bb-weight.py
. This dictionary holds deduplicated magic numbers exist in the PUT binary. Default to./unique.dict
if not specified.
- Specifies the path to the unique dictionary file generated by
--weight=path/to/weight/file
- Specifies the path to the weight file generated by
bb-weight.py
. This file holds scores for each node based on the PUT control flow graph (CFG) analysis. Default to./weight
if not specified.
- Specifies the path to the weight file generated by
--inst_bin=path/to/instrumented/bin
- Specifies the path to the executable instrumented by
polybuild
. Default to./instrumented.bin
if not specified.
- Specifies the path to the executable instrumented by
--taint_db=path/to/taint/db
- Specifies the path to the taint information database. Default to
/mnt/polytracker/polytracker.db
if not specified.
- Specifies the path to the taint information database. Default to
--taint_out=path/to/taint/db
- Specifies the path to the file where taint information is recorded. This file holds the taint information related to
lea
andcmp
instructions extracted from the taint information database. Default to/tmp/taint.out
if not specified.
- Specifies the path to the file where taint information is recorded. This file holds the taint information related to
VUzzer's fuzzing loop can be summarized as follows:
- For each initial seed given by the user, VUzzer collects the feedback (code coverage and exit status code) gathered from the program execution with the seed and adds it to the
seed_queue
as a new seed. All basic blocks contained in the code coverage are recorded asGood_BB
. - Execute PUT with randomly generated input and collect feedback (code coverage) gathered from the program execution with the input, repeating them a certain number of times. Let
Bad_BB
denote the union of all such sets of executed basic blocks by all random inputs. A basic block from such a set of executions is assumed to be an Error Handling Basic block (EHB) if it is present in each execution of inputs fromBad_BB
and it is not present inGood_BB
. - If the number of seeds stored in the
seed_queue
is less than apop_size
, generate new inputs by mutating the initial seeds, and add them to theseed_queue
as a new seed. This is repeated until a size ofseed_queue
reaches apop_size
. - The following loop is repeated until the fuzzing process terminates:
- At every certain number of loop cycles, VUzzer copies all seeds from the
seed_queue
and add them to thekeep_queue
. - At every certain number of loop cycles, VUzzer initiates an incremental analysis of EHB. For each seed in
seed_queue
, VUzzer collects the feedback (code coverage) gathered from the program execution with the seed. Then a basic block from code coverage is classified as an EHB if it is associated with at least 90% of the program execution with seeds, and it is not in theGood_BB
set. The intuition behind the incremental analysis is the observation that as fuzzing proceeds, the majority of newly generated inputs will end up triggering some error-handling code. - At the same time as EHB detection (2.), VUzzer also copies all seeds from the
keep_queue
and adds them to theseed_queue
. - For each seed in the
seed_queue
, VUzzer collects the feedback (code coverage and exit status code) gathered from the program execution with the seed and then calculates the fitness score of it based on the code coverage. If the seed finds previously unseen code coverage during program execution, VUzzer adds it to thetaint_queue
and prunes the other seeds (if at all) whose trace is a subset of the seed just got executed. - For each seed in the
taint_queue
, VUzzer collects the feedback (taint info) gathered from the program execution with the seed by dynamic taint analyzer. - VUzzer selects some seeds from the
seed_queue
, generates a new test case by mutating the chosen seeds repeatedly using a variety of methods, and adds it to the queue as a new seed. - Then, finally it deletes all the seeds whose score is less than the highest score from the
seed_queue
, except for the seeds newly generated in the previous mutation.
- At every certain number of loop cycles, VUzzer copies all seeds from the
Footnotes
-
Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, and Herbert Bos. 2017. VUzzer: Application-aware Evolutionary Fuzzing. In the Network and Distribution System Security (NDSS’17). ↩