How to decompile autovectorized binaries? #6045

thixotropist · 2023-12-23T19:07:43Z

thixotropist
Dec 23, 2023

Has any thought gone into helping the Ghidra decompiler make sense of code autovectorized by the compiler?

For example, compile and build this C file with gcc-13 or gcc-14, O3, and a machine architecture flag indicating vector instructions are supported:

#include <stdio.h>
int main(int argc, char** argv){
    const int N = 1320;
    char s[N];
    for (int i = 0; i < N - 1; ++i)
        s[i] = i + 1;
    s[N - 1] = '\0';
    printf(s);
}

For the x86_64 platform the decompiler results for a simple loop are very hard to interpret. For the RISCV-64 platform and gcc-14, the decompiler bails out completely. This gets worse when calls to memcpy or strlen are inlined by the compiler and autovectorized, as they can be in gcc-14 toolchains.

Example:

Compile the following under gcc-14 with an x86_64 toolchain and -march=sapphirerapids or -march=x86-64-v4, then pass the binary to Ghidra

#include <string.h>
int main() {
  const int N = 127;
  const uint32_t seed = 0xdeadbeef;
  srand(seed);

  // data gen
  double A[N];
  gen_rand_1d(A, N);

  // compute
  double copy[N];
  memcpy(copy, A, sizeof(A));
  
  // print
  printf("%f\n", copy[1]);
}

gcc-14 will replace memcpy with inlined vector instructions optimized for sapphirerapids Intel processors, which are apparently not
recognized by Ghidra 11.

Added Note: Sample x86_64 binaries can be found at https://github.com/thixotropist/ghidra_import_tests/tree/main/x86_64/exemplars, along with binutils objdump reference disassemblies.

thixotropist · 2024-01-16T15:15:44Z

thixotropist
Jan 16, 2024
Author

Here's a better example of the problem. It occurs more often than other autovectorizations, and with more variability.

typedef struct { char c[16]; } c16;
/* copy fixed 128 bits of memory when we don't know the alignment of source or destination*/
void cpymem_3 (c16 *a, c16* b)
{
  *a = *b;
}

Compile this with gcc-14 and -O2 on a riscv64 toolchain - or possibly on other toolchains where word and doubleword operations can trigger an alignment exception.

Ghidra 11.0 can't disassemble or decompile this simple structure copy. The gcc compiler's processing makes this common pattern harder to recognize:

anything that looks like a C memcpy call, copy loop, or structure copy gets turned into a cpymem pcode op
the context of that cpymem op includes any available information on number of bytes, alignment of source and destination, alignment restrictions on any structure elements within the source and destination, and whether or not the object copied fits into the vector register bank.
gcc expansion of the cpymem pcode op takes that context information into account and generates maybe half a dozen different vector instruction patterns to implement the copy operation. It appears to insure that if the original code were to throw an alignment exception, the vectorization instructions do too.
the compiler's peephole optimizer can then alter the generated vector instructions further.

The desirable Ghidra decompilation for the cpymem_3 function above would be something like:

void function_xxx(void *a, void* b)
{
  __builtin_memcpy(a, b, 16);
}

0 replies

thixotropist · 2024-09-01T12:49:53Z

thixotropist
Sep 1, 2024
Author

One way to approach this problem is to borrow from our AI friends and generate a training set of source code and autovectorized binaries, which can be used in recognizing vector instruction sequences. The GCC RISCV autovec compiler test suite provides over a thousand source code examples, which can be easily crosscompiled with a number of different machine architectures. Perhaps the existing Ghidra BSIM capabilities can be applied here.

There are a lot of ways compilers can apply vector and other instruction set extensions to optimize code. These will vary with compiler releases, and especially with the performance quirks of specific evolving RISCV microarchitectures. Control code that has nothing to do with data vectors gets optimized just as often as vector math code, disrupting manual Ghidra analysis. Short loops over arrays of structures can be especially hard to understand once vector instructions and huge vector registers are available to the compiler.

That argues for working up Ghidra models and compiler mockups together, tuning the compiler configurations to generate assembly code that aligns with critical sections of the binary being reviewed.

Other approaches look more complicated, like teaching SLEIGH how to generate pcode based on run-time values of vector context registers or teaching the Ghidra decompiler how to recognize the 10K+ riscv vector intrinsic C functions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to decompile autovectorized binaries? #6045

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

How to decompile autovectorized binaries? #6045

thixotropist Dec 23, 2023

Replies: 2 comments

thixotropist Jan 16, 2024 Author

thixotropist Sep 1, 2024 Author

thixotropist
Dec 23, 2023

thixotropist
Jan 16, 2024
Author

thixotropist
Sep 1, 2024
Author