Conversation
Bypass the PLT for external function calls, reducing call overhead.
|
Are there any semantic consequences of not using PLT stubs? |
From I read (I might be missing something here) With PLT, LD_PRELOAD and similar mechanisms can intercept function calls at runtime. Without PLT, symbol interposition may not work for those functions. What's affected:ltrace (relies on PLT interception to trace library calls) What still works fine:gdb (breakpoints, stepping, stack traces - actually cleaner without PLT frames) Note on malloc interceptionMemory debuggers that intercept malloc/free are unaffected because those calls go to libc, not libykcapi. The -fno-plt flag only affects calls from yklua to libykcapi functions (_yk_promote*, etc.), not calls to system libraries. |
|
We intercept a couple of functions like (search for |
|
@vext01 Do we have tests that cover these? If not, how might Pavel tell if things still work? |
|
These two tests might cover this:
I think we wrap thread creation so that we can create a shadow stack for new threads. Maybe for destruction too(?). I'm not sure though -- I didn't implement this part of the system. What I'd do is make the wrapper functions crash and then check they still crash on your branch. |
|
I was assuming that if yk tests are passing with this yklua change then this is a safe change :) |
|
(and if the wrapper functions don't crash before this branch, then we don't have good test coverage) |
|
Ah. yklua doesn't use threads, so you will probably get away with this for now. Sorry, I thought this was a |
|
Also my understanding is that PLT elimination (-fno-plt) only affects how runtime dynamic linking resolves external symbols in shared libraries. Since the --wrap redirection is already baked into the binary at link time, it's should be safe. |
Hang on, are we saying this will break in the future? |
|
I was worried that if the lua interpreter ever introduces calls to |
|
@Pavel-Durov Can you double check that this really works? @vext01 Would this be exercised by the thread tests in |
|
The --wrap mechanism is completely orthogonal to PLT, even if Lua introduces pthread_create calls in the future, they would still be wrapped. |
Not currently, because C tests are not linked with this PLT optimisation. |
Add
-fno-pltcompiler flag to yklua's Makefile to eliminate PLT (Procedure Linkage Table) indirection for calls from yklua to libykcapi functions.This change partially eliminates PLT overhead, with improvements for 3 benchmarks when running with the interpreter only (JIT disabled):
Summary
Key reductions:
__yk_promote_ptr@pltoverhead: 1.2–1.5% → 0% (eliminated)__yk_promote_usize@pltoverhead: 0.8–1.5% → 0% (eliminated)Note:
__yk_idempotent_promote_i32@pltand__ykrt_control_point@pltstubs persist - I think we'll need to update llvm pass to optimise it (I have a branch for that).Perf stats
Note: perf data collected with
YKD_JITC=noneLuLPeg
__yk_promote_ptr@plt__yk_promote_usize@plt__yk_idempotent_promote_i32@plt__ykrt_control_point@pltRichards
__yk_promote_ptr@plt__yk_promote_usize@plt__yk_idempotent_promote_i32@plt__ykrt_control_point@pltBaseline total: 5.4K samples, This Change total: 5.3K samples
Havlak
__yk_promote_ptr@plt__yk_promote_usize@plt__yk_idempotent_promote_i32@plt__ykrt_control_point@plt