Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic: x86_64: Support runtime dynamic patching #1746

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

clementguidi
Copy link
Contributor

@clementguidi clementguidi commented Jul 7, 2023

This is the sixth PR in a series of patches intended to bring runtime dynamic tracing on x86_64 to uftrace.

  1. dynamic: Refactor logic to perform multiple updates #1702
  2. dynamic: Refactor patch pattern logic #1703
  3. configure: Check for membarrier support #1704
  4. utils: Add helpers for runtime dynamic patching #1705
  5. dynamic: x86_64: refactor patch code #1745
  6. dynamic: x86_64: Support runtime dynamic patching #1746 🠈
  7. dynamic: x86_64: Optimize runtime dynamic patching #1747
  8. dynamic: x86 64: Support runtime dynamic unpatching #1748
  9. agent: Trigger runtime dynamic instrumentation #1749
  10. dynamic: x86_64: Fall back to original patching strategy when target is not running #1750
  11. dynamic: Display enhanced dynamic stats #1751

This PR implements the actual dynamic patching. It has a naive approach, which is optimized in #1747.

The patching strategy is as follows:

  1. Trap the entry of the critical region (patching zone) so no thread can enter it
  2. Serialize execution so cross-modification is effective in all threads
  3. Move threads still in the patching region out of line (if needed)
  4. Insert trampoline jump (but leave the trap)
  5. Replace the trap with the jump opcode

We make heavy use of signals is this naive approach, to serialize execution and redirect execution flow. Later, we only use 1 or 2 signals per patching batch.

Related: #1698

clementguidi and others added 9 commits July 6, 2023 12:26
Glibc < 2.30 doen't provide wrappers for 'gettid()' and 'tgkill()' so we
define them.

Signed-off-by: Clément Guidi <[email protected]>
Functions to setup real-time signals and broadcast signals to all
threads in an application. Useful for runtime synchronization
mechanisms.

Co-authored-by: Gabriel-Andrew Pollo-Guilbert <[email protected]>
Signed-off-by: Clément Guidi <[email protected]>
Skip the functions where uftrace already injected a call to a
trampoline.

Signed-off-by: Clément Guidi <[email protected]>
Refactor for more clarity.

Signed-off-by: Clément Guidi <[email protected]>
Refactor 'patch_code' so it can later be used at runtime.

Co-authored-by: Gabriel-Andrew Pollo-Guilbert <[email protected]>
Signed-off-by: Clément Guidi <[email protected]>
Check if instruction at a given address is ENDBR64.

Signed-off-by: Clément Guidi <[email protected]>
When patching a function at runtime, first insert an int3 trap so any
incoming thread is diverted from the patching region, to avoid executing
partially modified code.

The trap handler emulates a call to the trampoline, thus enabling the
instrumentation.

The trap is eventually removed in subsequent commits.

Co-authored-by: Gabriel-Andrew Pollo-Guilbert <[email protected]>
Signed-off-by: Clément Guidi <[email protected]>
When cross-modifying code, the software needs to ensure that all cores
will execute valid instructions at any time.

When modifications are not atomic, we issue a specific memory
barrier (or execute a serializing instruction on Linux < 4.16) to
serialize the execution across all cores. This flushes the different
caches, especially the processor pipelines that may have partially
fetched straddling instructions.

Co-authored-by: Gabriel-Andrew Pollo-Guilbert <[email protected]>
Signed-off-by: Clément Guidi <[email protected]>
When patching at runtime, no thread can enter the patching region due to
the trap that is inserted at the start of it.

But threads that entered the region before the trap is installed can
still be executing instructions in the region.

We broadcast a real-time signal instruction all threads to check their
instruction pointer, and execute out-of-line if they are in the patching
region.

Co-authored-by: Gabriel-Andrew Pollo-Guilbert <[email protected]>
Signed-off-by: Clément Guidi <[email protected]>

ASSERT(sig == SIGTRAP);

__atomic_signal_fence(__ATOMIC_SEQ_CST);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment why this fence is needed?

act.sa_sigaction = emulate_trampoline_call;
act.sa_flags = SA_SIGINFO;

if (sigaction(SIGTRAP, &act, NULL) < 0) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it had a different handler already?

return -1;

return 0;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A blank line please,

#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

#if HAVE_MEMBARRIER
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use #ifdef.

*/

if (register_trap(origin_code_addr, (void *)mdi->trampoline) == -1)
return INSTRUMENT_FAILED;
((uint8_t *)origin_code_addr)[0] = 0xcc;

synchronize_all_cores();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. are you calling it for each function? If so, I'm afraid it'd slow down the whole process.

@@ -698,8 +873,11 @@ static int patch_code(struct mcount_dynamic_info *mdi, struct mcount_disasm_info
memcpy(&((uint8_t *)origin_code_addr)[1], &trampoline_rel_addr, CALL_INSN_SIZE - 1);
memset(origin_code_addr + CALL_INSN_SIZE, 0x90, /* NOP */
info->orig_size - CALL_INSN_SIZE);
/* FIXME Need to sync cores? Store membarrier? */
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to flush I-caches. I doubt the store barrier is enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants