A tool to extract code content from source files using the clang and LLVM infrastructure.
clang-extract requires clang, LLVM, libelf, zlib, meson and ninja in order to build. On openSUSE, you can install them by running:
$ sudo zypper install clang18 clang18-devel libclang-cpp18 \
       clang-tools libLLVM18 llvm18 llvm18-devel libelf-devel meson ninja \
       zlib-devel libzstd-devel
It's advised to use LLVM 18 and higher, since it's well tested. But there support for LLVM 16 and 17 as well, but you might find issues with it.
Once you have all those packages installed, you must setup the meson build system in order
to compile. You can run either build-debug.sh for a debug build with no optimization
and debug flags enabled for development, or a full optimized build with
build-release.sh.  Those scripts will create a build folder where you can cd into
and invoke ninja for it to build.  Example:
$ ./build-release.sh
$ cd build
$ ninja
Then the clang-extract binary will be available for you in the build folder.
clang-extract has automated testing. Running the testsuite is as easy as running:
$ ninja test
inside the build directory.  Test results are written into *.log files in the
build folder.
Clang-extract currently only support C projects. Assuming clang-extract is compiled, it can be used to extract code content from projects using the following steps.
- Find, in the project, the function you want to extract, and which file it is in.
- Compile the project and grab the command line passed to the compiler.
- Replace gccwithclang-extract
- Pass -DCE_NO_EXTERNALIZATION -DCE_EXTRACT_FUNCTIONS=function -DCE_OUTPUT_FILE=/tmp/output.cto clang-extract.
- Done. In /tmp/output.cwill have everything necessary forfunctionto compile without any external dependencies.
Lets show how clang-extract works with a trivial example. Save the following code as a.c:
#include <stdlib.h>
#include <stdio.h>
void *unused_function(void)
{
  return malloc(1024);
}
int main(int argc, char *argv[])
{
  puts("Hello, world!");
  return 0;
}
compiling this code with clang would be:
$ clang a.c -O2 -o a
Note that the source code of a.c contain unused functions. In this case, clang-extract can be
used to extract only the functions actually needed. In this case, extract the main function:
$ clang-extract a.c -O2 -o a -DCE_EXTRACT_FUNCTIONS=main -DCE_OUTPUT_FILE=out.c
on the output file out.c, you will see the following code:
/** clang-extract: from /usr/include/stdio.h:719:1  */
extern int puts (const char *__s);
/** clang-extract: from /tmp/a.c:9:1  */
int main(int argc, char *argv[])
{
  puts("Hello, world!");
  return 0;
}
Notice how any reference to unused_function is removed and all headers has been removed and replaced by a declaration of
puts. The output code can be compiled with the same flags used to compile the original code:
$ clang out.c -O2 -o a
If you desire to keep the includes, see -DCE_KEEP_INCLUDES options and the Supported options chapter.
Code transformation is very often needed when generating livepatches. For example, in livepatching if we need to call functions that are not exported in the program (i.e. private), we need to do a process called externalization.
Externalization works by redeclaring the original symbol as a pointer to its original symbol. By doing that we avoid linking issues that may come from using an private symbol.
Externalization is automatically enabled by default and can be disabled by providing the
-DCE_NO_EXTERNALIZATION option.
For example, with the following input:
#include <stdio.h>
int function(void)
{
  return 0;
}
int main(int argc, char *argv[])
{
  puts("Hello, world!");
  return function();
}
calling clang-extract with:
$ clang-extract a.c -DCE_EXTRACT_FUNCTIONS=main -DCE_OUTPUT_FILE=out.c -DCE_EXPORT_SYMBOLS=function
will externalize the function function, as the following output shows:
/** clang-extract: from /usr/include/stdio.h:719:1  */
extern int puts (const char *__s);
/** clang-extract: from /tmp/a.c:3:1  */
static int (*klpe_function)(void);
/** clang-extract: from /tmp/a.c:8:1  */
int main(int argc, char *argv[])
{
  puts("Hello, world!");
  return (*klpe_function)();
}
as one can see, the function was replaced by a pointer to a function klpe_function. On livepatching,
this pointer to function is filled with the address of the original function, bypassing any kind of
linking issues generated by symbol visibility.
clang-extract is able to automatically detect which symbols should be externalized if correct information is given to it. For that, three switches are available for the user to provide such information:
- -DCE_DEBUGINFO_PATH=<path>: Path to the debuginfo of the binary that will receive the livepatching. For compiled binaries with- -g, this is embedded into the binary itself. With this clang-extract can discover which symbols are available and automatically mark the functions to be externalized.
- -DCE_IPACLONES_PATH=<path>: Path containing a single- ipa-clonesor a folder with multiple- ipa-clonesfile. This is used to verify the symbols that got inlined and may need to have its entire body copied to the output file.
- -DCE_SYMVERS_PATH=<path>: Path containing the kernel- Modules.symversfile, used by kernel livepatching to also externalize symbols that comes from modules that the livepatch do not want to depend upon.
The precision of the automatic analysis depends of the amount of information the user provides. Clang-extract will in any case try to do its best to figure out what is the best option when certain information is not available.
Let's extract the function __libc_malloc from the glibc project. The steps are:
- Compile the glibc project until malloc.cis compiled:make -j8 | grep malloc.c
- Grab the command line:
gcc malloc.c -c -std=gnu11 -fgnu89-inline  -g -O2 -Wall -Wwrite-strings -Wundef -Werror -fmerge-all-constants -frounding-math -fno-stack-protector -fno-common -Wp,-U_FORTIFY_SOURCE -Wstrict-prototypes -Wold-style-definition -fmath-errno    -fPIE   -DMORECORE_CLEARS=2  -ftls-model=initial-exec     -I../include -I/home/giulianob/projects/glibc/build_glibc/malloc  -I/home/giulianob/projects/glibc/build_glibc  -I../sysdeps/unix/sysv/linux/x86_64/64  -I../sysdeps/unix/sysv/linux/x86_64  -I../sysdeps/unix/sysv/linux/x86/include -I../sysdeps/unix/sysv/linux/x86  -I../sysdeps/x86/nptl  -I../sysdeps/unix/sysv/linux/wordsize-64  -I../sysdeps/x86_64/nptl  -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux  -I../sysdeps/nptl  -I../sysdeps/pthread  -I../sysdeps/gnu  -I../sysdeps/unix/inet  -I../sysdeps/unix/sysv  -I../sysdeps/unix/x86_64  -I../sysdeps/unix  -I../sysdeps/posix  -I../sysdeps/x86_64/64  -I../sysdeps/x86_64/fpu/multiarch  -I../sysdeps/x86_64/fpu  -I../sysdeps/x86/fpu  -I../sysdeps/x86_64/multiarch  -I../sysdeps/x86_64  -I../sysdeps/x86/include -I../sysdeps/x86  -I../sysdeps/ieee754/float128  -I../sysdeps/ieee754/ldbl-96/include -I../sysdeps/ieee754/ldbl-96  -I../sysdeps/ieee754/dbl-64  -I../sysdeps/ieee754/flt-32  -I../sysdeps/wordsize-64  -I../sysdeps/ieee754  -I../sysdeps/generic  -I.. -I../libio -I.  -D_LIBC_REENTRANT -include /home/giulianob/projects/glibc/build_glibc/libc-modules.h -DMODULE_NAME=libc -include ../include/libc-symbols.h  -DPIC  -DUSE_TCACHE=1   -DTOP_NAMESPACE=
glibc -o /home/giulianob/projects/glibc/build_glibc/malloc/malloc.o -MD -MP -MF /home/giulianob/projects/glibc/build_glibc/malloc/malloc.o.dt -MT /home/giulianob/projects/glibc/build_glibc/malloc/malloc.o
- Replace gccwithclang-extractand add the extra parameters (removed-Werrorsince clang treats some things as errors where gcc doesn't:
clang-extract malloc.c -c -std=gnu11 -fgnu89-inline  -g -O2 -Wall -Wwrite-strings -fmerge-all-constants -frounding-math -fno-stack-protector -fno-common -Wp,-U_FORTIFY_SOURCE -Wstrict-prototypes -Wold-style-definition -fmath-errno    -fPIE   -DMORECORE_CLEARS=2  -ftls-model=initial-exec     -I../include -I/home/giulianob/projects/glibc/build_glibc/malloc  -I/home/giulianob/projects/glibc/build_glibc  -I../sysdeps/unix/sysv/linux/x86_64/64  -I../sysdeps/unix/sysv/linux/x86_64  -I../sysdeps/unix/sysv/linux/x86/include -I../sysdeps/unix/sysv/linux/x86  -I../sysdeps/x86/nptl  -I../sysdeps/unix/sysv/linux/wordsize-64  -I../sysdeps/x86_64/nptl  -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux  -I../sysdeps/nptl  -I../sysdeps/pthread  -I../sysdeps/gnu  -I../sysdeps/unix/inet  -I../sysdeps/unix/sysv  -I../sysdeps/unix/x86_64  -I../sysdeps/unix  -I../sysdeps/posix  -I../sysdeps/x86_64/64  -I../sysdeps/x86_64/fpu/multiarch  -I../sysdeps/x86_64/fpu  -I../sysdeps/x86/fpu  -I../sysdeps/x86_64/multiarch  -I../sysdeps/x86_64  -I../sysdeps/x86/include -I../sysdeps/x86  -I../sysdeps/ieee754/float128  -I../sysdeps/ieee754/ldbl-96/include -I../sysdeps/ieee754/ldbl-96  -I../sysdeps/ieee754/dbl-64  -I../sysdeps/ieee754/flt-32  -I../sysdeps/wordsize-64  -I../sysdeps/ieee754  -I../sysdeps/generic  -I.. -I../libio -I.  -D_LIBC_REENTRANT -include /home/giulianob/projects/glibc/build_glibc/libc-modules.h -DMODULE_NAME=libc -include ../include/libc-symbols.h  -DPIC  -DUSE_TCACHE=1   -DTOP_NAMESPACE=glibc -o /home/giulianob/projects/glibc/build_glibc/malloc/malloc.o -MD -MP -MF /home/giulianob/projects/glibc/build_glibc/malloc/malloc.o.dt -MT /home/giulianob/projects/glibc/build_glibc/malloc/malloc.o -DCE_NO_EXTERNALIZATION -DCE_OUTPUT_FILE=/tmp/out.c -DCE_EXTRACT_FUNCTIONS=__libc_malloc
- The output should be in /tmp/out.cand should be self-compilable. Check it by calling$ gcc -c /tmp/out.c. Here is the output for malloc: https://godbolt.org/z/6vrrTPoP9
Clang-extract support many options which controls the output code:
- -D__KERNEL__Indicate that we are processing a Linux sourcefile, which triggers some special logics for kernel livepatching.
- -DCE_EXTRACT_FUNCTIONS=<args>Extract the functions specified in the list, separated by commas.
- -DCE_EXPORT_SYMBOLS=<args>Force externalization of symbols specified in the list, separated by commas.
- -DCE_OUTPUT_FILE=<arg>Output code to file. Default is- <input>.CE.c.
- -DCE_NO_EXTERNALIZATIONDisable symbol externalization.
- -DCE_DUMP_PASSESDump the results of each transformation pass into files. Files will be dumped at the same path of the input files. Additional files are also generated on- /tmp/folder.
- -DCE_KEEP_INCLUDESKeep all possible- #include<file>directives.
- -DCE_KEEP_INCLUDES=<policy>Keep all possible- #include<file>directives, but using the specified include expansion . Valid values are nothing, everything and kernel.
- -DCE_EXPAND_INCLUDES=<args>Force expansion of the headers provided in .
- -DCE_RENAME_SYMBOLSAllow renaming of extracted symbols.
- -DCE_DEBUGINFO_PATH=<arg>Path to the compiled (ELF) object of the desired program to extract. This is used to decide if externalization is necessary or not for given symbol.
- -DCE_IPACLONES_PATH=<arg>Path to gcc .ipa-clones files generated by gcc. Used to decide if desired function to extract was inlined into other functions.
- -DCE_SYMVERS_PATH=<arg>Path to kernel Modules.symvers file. Only used when- -D__KERNEL__is specified.
- -DCE_DSC_OUTPUT=<arg>Libpulp .dsc file output, used for userspace livepatching.
- -DCE_LATE_EXTERNALIZEEnable late externalization (declare externalized variables later than the original). May reduce code output when- -DCE_KEEP_INCLUDESis enabled.
- -DCE_IGNORE_CLANG_ERRORSIgnore clang compilation errors in a hope that code is generated even if it won't compile.
For more switches, see
$ clang-extract --help
for more options.
Currently we only support projects written in C. Clang-extract is extensively tested with the Linux kernel, glibc and openSSL sourcecode. C++ support is planned and clang-extract has some tests for it, but it can not handle libstdc++ headers yet.