-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dynamic: x86_64: Support runtime dynamic patching #1746
base: master
Are you sure you want to change the base?
dynamic: x86_64: Support runtime dynamic patching #1746
Conversation
Glibc < 2.30 doen't provide wrappers for 'gettid()' and 'tgkill()' so we define them. Signed-off-by: Clément Guidi <[email protected]>
Functions to setup real-time signals and broadcast signals to all threads in an application. Useful for runtime synchronization mechanisms. Co-authored-by: Gabriel-Andrew Pollo-Guilbert <[email protected]> Signed-off-by: Clément Guidi <[email protected]>
Skip the functions where uftrace already injected a call to a trampoline. Signed-off-by: Clément Guidi <[email protected]>
Refactor for more clarity. Signed-off-by: Clément Guidi <[email protected]>
Refactor 'patch_code' so it can later be used at runtime. Co-authored-by: Gabriel-Andrew Pollo-Guilbert <[email protected]> Signed-off-by: Clément Guidi <[email protected]>
Check if instruction at a given address is ENDBR64. Signed-off-by: Clément Guidi <[email protected]>
When patching a function at runtime, first insert an int3 trap so any incoming thread is diverted from the patching region, to avoid executing partially modified code. The trap handler emulates a call to the trampoline, thus enabling the instrumentation. The trap is eventually removed in subsequent commits. Co-authored-by: Gabriel-Andrew Pollo-Guilbert <[email protected]> Signed-off-by: Clément Guidi <[email protected]>
When cross-modifying code, the software needs to ensure that all cores will execute valid instructions at any time. When modifications are not atomic, we issue a specific memory barrier (or execute a serializing instruction on Linux < 4.16) to serialize the execution across all cores. This flushes the different caches, especially the processor pipelines that may have partially fetched straddling instructions. Co-authored-by: Gabriel-Andrew Pollo-Guilbert <[email protected]> Signed-off-by: Clément Guidi <[email protected]>
When patching at runtime, no thread can enter the patching region due to the trap that is inserted at the start of it. But threads that entered the region before the trap is installed can still be executing instructions in the region. We broadcast a real-time signal instruction all threads to check their instruction pointer, and execute out-of-line if they are in the patching region. Co-authored-by: Gabriel-Andrew Pollo-Guilbert <[email protected]> Signed-off-by: Clément Guidi <[email protected]>
|
||
ASSERT(sig == SIGTRAP); | ||
|
||
__atomic_signal_fence(__ATOMIC_SEQ_CST); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a comment why this fence is needed?
act.sa_sigaction = emulate_trampoline_call; | ||
act.sa_flags = SA_SIGINFO; | ||
|
||
if (sigaction(SIGTRAP, &act, NULL) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it had a different handler already?
return -1; | ||
|
||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A blank line please,
#include <string.h> | ||
#include <sys/mman.h> | ||
#include <unistd.h> | ||
|
||
#if HAVE_MEMBARRIER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually use #ifdef
.
*/ | ||
|
||
if (register_trap(origin_code_addr, (void *)mdi->trampoline) == -1) | ||
return INSTRUMENT_FAILED; | ||
((uint8_t *)origin_code_addr)[0] = 0xcc; | ||
|
||
synchronize_all_cores(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. are you calling it for each function? If so, I'm afraid it'd slow down the whole process.
@@ -698,8 +873,11 @@ static int patch_code(struct mcount_dynamic_info *mdi, struct mcount_disasm_info | |||
memcpy(&((uint8_t *)origin_code_addr)[1], &trampoline_rel_addr, CALL_INSN_SIZE - 1); | |||
memset(origin_code_addr + CALL_INSN_SIZE, 0x90, /* NOP */ | |||
info->orig_size - CALL_INSN_SIZE); | |||
/* FIXME Need to sync cores? Store membarrier? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it needs to flush I-caches. I doubt the store barrier is enough.
This is the sixth PR in a series of patches intended to bring runtime dynamic tracing on x86_64 to uftrace.
This PR implements the actual dynamic patching. It has a naive approach, which is optimized in #1747.
The patching strategy is as follows:
We make heavy use of signals is this naive approach, to serialize execution and redirect execution flow. Later, we only use 1 or 2 signals per patching batch.
Related: #1698