-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Backward incompatible changes in YARA 4.0 API
YARA 4.0.0 introduces some backward-incompatible changes in its C API that developers must be aware of. Backward-incompatible changes are always a nuisance and put a maintenance burden in software that depends on libyara, but they are sometimes necessary in order to pay technical debt and move the project forward in good shape. This document aims to explain those changes, how they affect existing users, and the reason behind them.
YR_RULES
is a cornerstone structure in the YARA's API. This structure represents
a set of YARA rules that has been compiled from their textual form. Instances
of YR_RULES
are created by calling the yr_compiler_get_rules
function, and here
comes the first change: while in YARA 2.x and 3.x this function returned a new
instance of YR_RULES
each time you called the function, in YARA 4.0.0 this structure
is a singleton, all calls to yr_compiler_get_rules
return the same YR_RULES
structure.
In previous versions having multiple instances of YR_RULES
was necessary
because each instance of the structure could be shared by 32 threads at most
(the limit was initially 16, but it was raised to 32 in later versions). If you
wanted to use more than 32 scanning threads, you needed additional instances
of YR_RULES
. With YARA 4.0.0 however this limit doesn't exist anymore, the
YR_RULES
structure can be shared with as many threads you want, and therefore
a single instance is enough.
By having a single instance of YR_RULES
your program's memory footprint is greatly
reduced, specially when you are compiling thousands of rules. Also, individual rules
within the YR_RULES
structure now take less space, which also contributes to save
memory. In VirusTotal we reduced the size of compiled rules from more than 4GB to
less than 1GB.
If your program creates multiple instances of YR_RULES
for the same compiled rules,
and for each of those instances it calls yr_rules_define_XXX_variable
for
assigning different values to some variable X, it won't work as it used to be.
In YARA 3.x each instance of YR_RULES
holds its own set of variables, but in YARA
4.0 there's a single YR_RULES
per compiled rules. If you want to scan your
data with the same compiled rules but only changing the values of some variables
you need to create multiple YR_SCANNER
structures for your YR_RULES
using
yr_scanner_create
, and then use yr_scanner_define_XXX_variable
on each of them
for assigning the desired values to your variables. Each YR_SCANNER
can have
a different value for some variable X.
The yr_compiler_set_callback
functions accepts a pointer to a callback function
that YARA will call for notifying you about errors occurred during the compilation
of your rules. In YARA 3.x the callback's definition was:
void callback_function(
int error_level,
const char* file_name,
int line_number,
const char* message,
void* user_data)
In YARA 4.0 a new argument const YR_RULE* rule
has been added:
void callback_function(
int error_level,
const char* file_name,
int line_number,
const YR_RULE* rule,
const char* message,
void* user_data)
This new argument is a pointer to the rule containing the error, but it can be NULL
if the error wasn't found within a rule definition.
Programs receive information about matches found in the scanned data via a callback function. The program provides the callback function and YARA calls it whenever it finds a matching (or not matching) rule. This callback has changed its signature from:
int callback_function(
int message,
void* message_data,
void* user_data);
To:
int callback_function(
YR_SCAN_CONTEXT* context,
int message,
void* message_data,
void* user_data);
Notice that the callback function now receives an additional argument
YR_SCAN_CONTEXT* context
. This structure is opaque to the program, you shouldn't
rely on the fields contained in the structure, but the context will be necessary
for iterating the matches for a given string as will be shown below.
The yr_string_matches_foreach
macro now receives an additional argument, the scan context
mentioned in the section above. This macro is used for iterating over the
matches found for a given string in one of your rules. In YARA 3.x the implementation
of your callback function looked similar to:
int callback_function(
int message,
void* message_data,
void* user_data)
{
if (message == CALLBACK_MSG_RULE_MATCHING)
{
// If message is CALLBACK_MSG_RULE_MATCHING message_data is a pointer
// to the matching rule.
YR_RULE* rule = (YR_RULE*) message_data;
YR_STRING* string;
// Iterate the rule's strings
yr_rule_strings_foreach(rule, string)
{
// Iterate the matches for the current string.
yr_string_matches_foreach(string, match)
{
..do something with match
}
}
}
}
In YARA 4.0 it will look like:
int callback_function(
YR_SCAN_CONTEXT* context,
int message,
void* message_data,
void* user_data)
{
if (message == CALLBACK_MSG_RULE_MATCHING)
{
// If message is CALLBACK_MSG_RULE_MATCHING message_data is a pointer
// to the matching rule.
YR_RULE* rule = (YR_RULE*) message_data;
YR_STRING* string;
// Iterate the rule's strings
yr_rule_strings_foreach(rule, string)
{
// Iterate the matches for the current string.
yr_string_matches_foreach(context, string, match)
{
..do something with match
}
}
}
}
Notice the extra argument context
in the callback definition and how it is used with yr_string_matches_foreach
.