MeltingPot

MeltingPot is an automated common binary signature extractor and pattern generator. For the given sample set with the same file format, it slices each file into small pieces of binary sequences and correlates the files sharing the similar sequences. To show the result, MeltingPot generates a set of YARA formatted patterns each of which represents the common signature of a certain file cluster. Such patterns can be directly applied by YARA scan engine.

Introduction

MeltingPot is composed of the main engine and the supporting plugins. The relation is briefly illustrated here:

The engine first loads the user specified configuration.
It applies the file slicing plugin to slice the input files.
It correlates the slices by examining their similarity with the help of similarity comparison plugin.
Now the engine acquires the file slice clusters. It then extracts the common binary signature from each cluster.
Finally, the engine outputs the signatures with pattern formation plugin.


The example pattern for the Windows Notepad and its packed version using several kinds of software protectors.

As mentioned above, we have three kinds of plugins:

File slicing - Slicing an input file by format parsing. (E.g. Windows PE, Android DEX)
Similarity comparison - Measuring the similarity for a pair of slices. (E.g. ssdeep, ngram)
Pattern formation - Producing YARA pattern.

If analysts intend for custom research purpose, they can craft the custom plugins in the plugin source directory and use the MeltingPot build script to create the libraries.

Installation

Basic

First of all, we need to prepare the following utilities:

CMake - A cross platform build system.
Valgrind - An instrumentation framework help for memory debug.
SSDeep - A fuzzy hash generation and comparison library.
GLib - A large set of libraries to handle common data structures.
libconfig - A library to process structured configuration file.

For Ubuntu 12.04 and above, it should be easy:

$ sudo apt-get install -qq cmake
$ sudo apt-get install -qq valgrind
$ sudo apt-get install -qq libfuzzy-dev
$ sudo apt-get install -qq libglib2.0-dev
$ sudo apt-get install -qq libconfig-dev

Now we can build the entire source tree under the project root folder:

$ ./clean.py --rebuild
$ cd build
$ cmake ..
$ make

Then the engine should be under:

./engine/bin/release/cluster

And the relevant plugins should be under:

./plugin/slice/lib/release/libslc_*.so
./plugin/similarity/lib/release/libsim_*.so
./plugin/format/lib/release/libfmt_*.so

Advanced

If we modify the main engine or the plugins, we can move to the corresponding subdirectory to rebuild the binary.
To build the engine independently:

$ cd engine
$ ./clean.py --rebuild
$ cd build
$ cmake .. -DCMAKE_BUILD_TYPE=Debug|Release
$ make

Note that we have two build types.
For debug build, the compiler debug flags are turned on, and the binary locates at ./engine/bin/debug/cluster.
For optimized build, the binary locates at ./engine/bin/release/cluster.

To build the plugin independently (using File Slicing plugin as example):

$ cd plugin/slice
$ ./clean.py --rebuild
$ cd build
$ cmake .. --DCMAKE_BUILD_TYPE=Debug|Release
$ make

Again, we must specify the build type for compiliation.
For debug build, the binary locates at ./plugin/slice/debug/libslc_*.so.
For optimized build, the binary locates at ./plugin/slice/release/libslc_*.so.
For the other two kinds of plugins, the build rule is the same.

Usage

To run the engine, we should specify some relevant configurations. The example is shown in ./engine/cluster.conf.
We discuss these parameters below:

Parameter	Description
`SIZE_SLICE`	The size of the sliced file binary
`SIZE_HEX_BLOCK`	The length of the signature extracted from a slice cluster
`COUNT_HEX_BLOCK`	The number of to be extracted signatures from a cluster
`THRESHOLD_SIMILARITY`	The threshold to group similar slices
`RATIO_NOISE`	The ratio of dummy bytes (00 or ff) in a signature
`RATIO_WILDCARD`	The ratio of wildcard characters in a signature
`TRUNCATE_GROUP_SIZE_LESS_THAN`	The threshold to ignore trivial clusters
`FLAG_COMMENT`	The knob for pattern comments
`PATH_ROOT_INPUT`	The pathname of input sample set
`PATH_ROOT_OUTPUT`	The pathname of output pattern folder
`PATH_PLUGIN_SLICE`	The pathname of the file slicing plugin
`PATH_PLUGIN_SIMILARITY`	The pathname of similarity comparison plugin
`PATH_PLUGIN_FORMAT`	The pathname of pattern formation plugin

In addition, we have the following advanced parameters:

Parameter	Description
`COUNT_THREAD`	The number of running threads
`IO_BANDWIDTH`	The maximum number of files a thread can simultaneously open

With the configuration file prepared, we can launch the MeltPot engine:

For normal task, run:

./engine/bin/release/cluster --conf ./engine/cluster.conf

For memory debug, use debug build and run:

valgrind ./engine/bin/debug/cluster --conf ./engine/cluster.conf

Note that if we apply valgrind for memory debugging, valgrind will produce a "still-reachable" alert in the summary report. This is due to the side effect produced by GLib. MeltingPot should be memory safe :-).

Contact

Please contact me via the mail [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
engine		engine
golden		golden
plugin		plugin
res		res
util		util
.travis.yml		.travis.yml
CMakeLists.txt		CMakeLists.txt
COPYING		COPYING
README.md		README.md
clean.py		clean.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MeltingPot

Introduction

Installation

Basic

Advanced

Usage

Contact

About

Releases

Packages

Languages

License

ZSShen/MeltingPot

Folders and files

Latest commit

History

Repository files navigation

MeltingPot

Introduction

Installation

Basic

Advanced

Usage

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages