Skip to content

Latest commit

 

History

History
120 lines (90 loc) · 7.68 KB

algorithm_en.md

File metadata and controls

120 lines (90 loc) · 7.68 KB

VUzzer

What is VUzzer

https://github.com/vusec/vuzzer64

VUzzer1 is a mutation-based fuzzer proposed at NDSS 2017 that attempt to discover unknown execution paths by modifying one or more seeds selected from the queue.

The main feature of VUzzer is that it tries to estimate data structures from the PUT control flow and the data flow to find the execution paths efficiently without any knowledge of the source code or input formats. It has been a pioneer in application-aware fuzzer without prior knowledge. However, the original VUzzer implementation had problems running in modern environments, so we reimplemented it in fuzzuf.

Prerequisite

VUzzer PUT requires static analysis with IDAPython and instrumentation with PolyTracker before fuzzing. Install fuzzuf, Intel Pin, and PolyTracker as described in build_en.md, and follow the instruction below.

Disable ASLR

VUzzer uses the passed basic block addresses for coverage, so we must disable ASLR before running.

sudo sysctl -w kernel.randomize_va_space=0

Mount tmpfs

PolyTracker writes propagated taint information to the database on taint analysis. We highly recommend using tmpfs for the database location because it is an I/O-intensive workload.

sudo mkdir -p /mnt/polytracker
sudo mount -t tmpfs -o size=100m tmpfs /mnt/polytracker

Instrumentation with PolyTracker

cd build
mkdir vuzzer_test
cp test/put_binaries/calc/calc.c ./vuzzer_test
cd vuzzer_test
polybuild --instrument-target -g -o instrumented.bin calc.c

The instrumented.bin is a calc binary with taint analysis operations instrumented by polybuild.

Static Analysis with IDAPython

gcc -o calc calc.c
/path/to/idat64 -A -S../../tools/bbweight/bb-weight-ida.py calc

The last command will generate three files in total: two dictionary files unique.dict and full.dict, and the weighted PUT control flow graph (CFG) file weight.

After the above steps, you can find the following files in vuzzer_test directory if the build succeeded.

calc
calc.c
full.dict
instrumented.bc
instrumented.bin
instrumented_instrumented.bc
instrumented_instrumented.o
unique.dict
weight

Usage on CLI

Run the following command in vuzzer_test directory:

../fuzzuf vuzzer --in_dir=../test/put_binaries/calc/seeds -- ./calc @@

Place three or more initial seeds in the directory specified with --indir.

The available options:

  • Global options (available for all fuzzers on fuzzuf)

    • --out_dir=path/to/output/directory
      • Specifies a path to the directory, where all the outputs from fuzzers go, such as crash seeds. The default path /tmp/fuzzuf-out_dir is used if not specified.
    • --exec_timelimit_ms=1234
      • Specifies a time limit per PUT execution in milliseconds. The default time limit is 1 second (i.e. 1000 ms).
    • --exec_memlimit=1234
      • Specifies the amount of memory available per PUT execution in megabytes. The default memory limit is unlimited.
    • --log_file=path/to/log/file
      • Specifies a path to the file, where log outputs (and debug-log outputs if built with debug mode) go. Logs are printed to stdout if a path is not specified.
  • Local options (available for VUzzer only)

    • --full_dict=path/to/full.dict
      • Specifies the path to the full dictionary file generated by bb-weight.py. This dictionary holds all magic numbers exist in the PUT binary. Default to ./full.dict if not specified.
    • --unique_dict=path/to/unique.dict
      • Specifies the path to the unique dictionary file generated by bb-weight.py. This dictionary holds deduplicated magic numbers exist in the PUT binary. Default to ./unique.dict if not specified.
    • --weight=path/to/weight/file
      • Specifies the path to the weight file generated by bb-weight.py. This file holds scores for each node based on the PUT control flow graph (CFG) analysis. Default to ./weight if not specified.
    • --inst_bin=path/to/instrumented/bin
      • Specifies the path to the executable instrumented by polybuild. Default to ./instrumented.bin if not specified.
    • --taint_db=path/to/taint/db
      • Specifies the path to the taint information database. Default to /mnt/polytracker/polytracker.db if not specified.
    • --taint_out=path/to/taint/db
      • Specifies the path to the file where taint information is recorded. This file holds the taint information related to lea and cmp instructions extracted from the taint information database. Default to /tmp/taint.out if not specified.

Algorithm Overview

VUzzer's fuzzing loop can be summarized as follows:

  1. For each initial seed given by the user, VUzzer collects the feedback (code coverage and exit status code) gathered from the program execution with the seed and adds it to the seed_queue as a new seed. All basic blocks contained in the code coverage are recorded as Good_BB.
  2. Execute PUT with randomly generated input and collect feedback (code coverage) gathered from the program execution with the input, repeating them a certain number of times. Let Bad_BB denote the union of all such sets of executed basic blocks by all random inputs. A basic block from such a set of executions is assumed to be an Error Handling Basic block (EHB) if it is present in each execution of inputs from Bad_BB and it is not present in Good_BB.
  3. If the number of seeds stored in the seed_queue is less than a pop_size, generate new inputs by mutating the initial seeds, and add them to the seed_queue as a new seed. This is repeated until a size of seed_queue reaches a pop_size.
  4. The following loop is repeated until the fuzzing process terminates:
    1. At every certain number of loop cycles, VUzzer copies all seeds from the seed_queue and add them to the keep_queue.
    2. At every certain number of loop cycles, VUzzer initiates an incremental analysis of EHB. For each seed in seed_queue, VUzzer collects the feedback (code coverage) gathered from the program execution with the seed. Then a basic block from code coverage is classified as an EHB if it is associated with at least 90% of the program execution with seeds, and it is not in the Good_BB set. The intuition behind the incremental analysis is the observation that as fuzzing proceeds, the majority of newly generated inputs will end up triggering some error-handling code.
    3. At the same time as EHB detection (2.), VUzzer also copies all seeds from the keep_queue and adds them to the seed_queue.
    4. For each seed in the seed_queue, VUzzer collects the feedback (code coverage and exit status code) gathered from the program execution with the seed and then calculates the fitness score of it based on the code coverage. If the seed finds previously unseen code coverage during program execution, VUzzer adds it to the taint_queue and prunes the other seeds (if at all) whose trace is a subset of the seed just got executed.
    5. For each seed in the taint_queue, VUzzer collects the feedback (taint info) gathered from the program execution with the seed by dynamic taint analyzer.
    6. VUzzer selects some seeds from the seed_queue, generates a new test case by mutating the chosen seeds repeatedly using a variety of methods, and adds it to the queue as a new seed.
    7. Then, finally it deletes all the seeds whose score is less than the highest score from the seed_queue, except for the seeds newly generated in the previous mutation.

References

Footnotes

  1. Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, and Herbert Bos. 2017. VUzzer: Application-aware Evolutionary Fuzzing. In the Network and Distribution System Security (NDSS’17).