AFL (American Fuzzy Lop) is a coverage-guided fuzzer developed by Michał Zalewski in 2013. AFL is a family of mutation-based fuzzers that choose one of the seeds saved in the queue, and tries to discover previously unknown execution paths with seed mutations. In other words, AFL applies genetic algorithms against seeds.
It is also worth mentioning that AFL has become the basis of many other fuzzing algorithms. There are various fuzzing researches that have successfully achieved performance gains 1 2 3, extensions for applicable PUTs and environments 4 5, and conversions into an unusual/unique usage 6 7, with partially modifying the original implementation of AFL. Therefore, we reimplemented AFL in fuzzuf since it is the base of current fuzzing research.
The fuzzuf's implementation has consistency and greater execution speed compared to its original, as showed in fuzzuf/fuzzuf-afl-artifact.
Once you have installed fuzzuf
, you can start AFL fuzzer as follows:
$ fuzzuf afl --in_dir=path/to/initial/seeds -- path/to/PUT @@
Options available are listed below:
-
Global options (available for all fuzzers on
fuzzuf
)--out_dir=path/to/output/directory
- Specifies a path to the directory, where all the outputs from fuzzers go, such as crash seeds. The default path
/tmp/fuzzuf-out_dir
is used if not specified.
- Specifies a path to the directory, where all the outputs from fuzzers go, such as crash seeds. The default path
--executor=native|qemu|coresight
- Specifies an executor type for the PUT. The default value is
native
.
- Specifies an executor type for the PUT. The default value is
--proxy_path=path/to/proxy/application
- Specifies a path to the executor proxy application (e.g.
afl-qemu-trace
). This option is required when an executor other thannative
is specified.
- Specifies a path to the executor proxy application (e.g.
--exec_timelimit_ms=1234
- Specifies a time limit per PUT execution in milliseconds. The default time limit is 1 second (i.e. 1000 ms).
--exec_memlimit=1234
- Specifies the amount of memory available per PUT execution in megabytes. The default memory limit is unlimited.
--log_file=path/to/log/file
- Specifies a path to the file, where log outputs (and debug-log outputs if built with debug mode) go. Logs are printed to stdout if a path is not specified.
-
Local options (only available for AFL)
--dict_file=path/to/dict/file
- Specifies a path to the file, loaded as an additional dictionary.
The following is the overview of the process flow of AFL:
-
For each initial seed given by the user, it executes a PUT using the seed as input and records the feedback (execution path, exit status, and execution time). It also stores all the initial seeds in a queue.
-
Repeat the following loop forever:
- Get seed from the head of the queue.
- Preprocess the seed. Minimize the seed by heuristically creating as small input as possible with the same feedback and calculate the goodness of the seed.
- Perform various mutations on the seed, and execute a PUT with the generated byte sequence as input.
- If the PUT passes the new execution path, the fuzzer adds it to the queue as valuable.
The algorithm above must determine if the PUT passed through an undiscovered execution path. In general, it is highly inefficient to record all the execution paths and strictly decide whether they match. Therefore, AFL records whether each edge on the control flow graph has passed instead of recording the execution paths. This record for an edge is called edge coverage in software testing.
In AFL, the compiler inserts the code to measure the edge coverage into the PUT so that the fuzzer can get the edge coverage via shared memory. This action is called instrumentation. Specifically, the compiler performs the following steps:
- Insert code so that the PUT attaches to the shared memory specified by the environment variable at startup. Define a pointer to the attached shared memory as the global variable
u8* __afl_area_ptr
. - Assign a random number
r(v)
, a 16-bit integer, to each vertexv
of the Control Flow Graph. - Compute
h(e) = r(v) ^ (r(u) >> 1)
for an edgee: u -> v
. - Insert code to execute
__afl_area_ptr[h(e)]++
when the PUT passes through the edgee: u -> v
.- To handle indirect jumps, insert the code to execute
__afl_area_ptr[r(v) ^ __afl_prev_loc]++
at the beginning of each Basic Blockv
and__afl_prev_loc = r(v) >> 1
at the end, whereu32 __afl_prev_loc
is a global variable.
- To handle indirect jumps, insert the code to execute
As you can see, AFL does not fully measure edge coverage but attempts to make it more efficient by making the following approximations:
- The PUT counts the number of times each edge is passed by an 8-bit variable, so counts above 256 will be inaccurate. In particular, if the count is a multiple of 256, the count will be zero.
- The ID assigned to each edge of the Control Flow Graph is calculated using a hash. Therefore, the IDs are not unique and may conflict. If the IDs of the edges collide, the edges will use the same counter.
AFL makes an additional approximation when using the edge coverage: round the value to the power of 2 for each counter. Specifically, the fuzzer rounds the values according to the following rules:
- 0, 1, and 2 will remain the same, respectively.
- 3 will be rounded to 4.
- The values in [4, 7] will be rounded to 8.
- The values in [8, 15] will be rounded to 16.
- The value contained in [16, 31] is rounded to 32.
- The value contained in [32, 127] is rounded to 64.
- The values in [128, 255] are rounded to 128.
Intuitively, we think that if the executions pass through an edge a similar number of times, it is a similar execution path. This approximation allows us to ignore minor differences in values, especially when the counter values are large. AFL has a flag for each edge to represent this information with each power of 2 from 0 to 128. When one of the flags is set for the first time, the fuzzer stores the corresponding input in the queue.
AFL will mutate all saved seeds in order but may skip mutation with a certain probability. Whether it ignores or not depends on whether the seed is a favorite or not.
A seed is a favorite if its counter in shared memory satisfies the following conditions:
- When the seed is input to a PUT, the counter takes on a non-zero value.
- The seed has the smallest value of (seed length) x (PUT execution time with the seed as input) compared to other seeds that have non-zero counters.
There are two main types of mutations used by AFL:
This type of mutation includes bit flips, addition and subtraction, and word insertion. These mutations are applied to all possible input positions of the seed with all possible parameters, so the set of bytes generated will not change no matter how many times the mutation occurs. As a result, no matter how many times you run PUT with the generated byte sequence as input, the resulting edge coverage will not change (unless PUT behaves randomly). Therefore, if the fuzzer has applied deterministic mutations in the past to the seed, it will never use them again.
This type of mutation includes havoc and splicing (the equivalent of crossover in genetic algorithms). It applies random changes to random input positions.
The number of times the mutation is applied varies depending on the seed. More specifically, the fuzzer performs havoc and splicing several times proportional to the goodness of the pre-computed seed. Including the selection of the seed, we can say that AFL performs mutation on the seed in a way similar to a weighted round-robin method.
Footnotes
-
Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2016. Coverage-based Greybox Fuzzing as Markov Chain. In Proceedings of the 23rd ACM Conference on Computer and Communications Security (CCS’16). ↩
-
Chenyang Lyu, Shouling Ji, Chao Zhang, Yuwei Li, Wei-Han Lee, Yu Song, and Raheem Beyah. 2019. MOpt: Optimized Mutation Scheduling for Fuzzers. In Proceedings of the 28th USENIX Security Symposium (Security'19). ↩
-
Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. 2020. AFL++: Combining Incremental Steps of Fuzzing Research. In Proceedings of the 14th USENIX Workshop on Offensive Technologies (WOOT'20). ↩
-
Sergej Schumilo, Cornelius Aschermann, Robert Gawlik, Sebastian Schinzel, and Thorsten Holz. 2017. kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels. In Proceedings of the 26th USENIX Security Symposium (Security'17). ↩
-
Google Project Zero. "WinAFL" https://github.com/googleprojectzero/winafl ↩
-
Cornelius Aschermann, Sergej Schumilo, Ali Abbasi, and Thorsten Holz. 2020. IJON: Exploring Deep State Spaces via Fuzzing. In Proceedings of the 41st IEEE Symposium on Security and Privacy (S&P'20). ↩
-
Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. Directed Greybox Fuzzing. In Proceedings of the 24th ACM Conference on Computer and Communications Security (CCS'17). ↩