-
-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to mark blocking/unblocking points around direct system calls #44
Comments
The difficulty here is that system calls are often called by inline assembly 'int 0x80' so the LD_PRELOAD interception isn't sufficient. Probably need to use ptrace, or otherwise require the program being profiled to annotate these manually-implemented synchronization primitives using something similar to how the COZ_PROGRESS macro calls into the libcoz runtime. |
I put some hacky code here: https://gist.github.com/toddlipcon/761fa7f8bd9e91f8a8dd |
I'm not excited about paying A long term solution may be to use perf's trace events, which can count context switches and thread blocking counts, regardless of the cause of the blocking/unblocking event. |
Another option might be for Coz to expose a mechanism that would let programmers indicate that certain functions correspond to |
@emeryberger: Yeah, that's roughly what the patch above does. @toddlipcon: your implementation seems like it should be okay. Do you have a simple example where coz seems to do the wrong thing with your extra macros? |
I got a chance to look at this again today (thanks for pinging this issue). Looking at the profile results, it looks like coz is just picking the same experiment over and over again to the point that it basically is never exploring any interesting parts of the code. In particular, I'm profiling a benchmark program that looks like the following code:
The implementation of 'MakeRPCCall' essentially delegates work to another thread (a libev event loop) and then blocks on a mutex/condvar until the call gets back. The issue seems to be that, in the steady state, most of my threads are blocked in this stack trace waiting for a response. So the task-clock based profile collection means that it's extremely likely to pick this line of code for the experiment:
(rtest.proxy.cc:84 is the last line within my source code for sending an RPC). It seems almost as if the perf events are only getting collected from the "client" thread and not the "server" threads which are in the same process. Any ideas? |
I just noticed that if I don't pass '-s %%/src/kudu/%%' I end up getting a lot better spreading to experiments on other files. Maybe something is wrong with the way that experiment lines are getting filtered. |
When you omit the If coz gets a sample that's not in the given source scope, it walks back up the stack to find the last callsite that is in scope (if any). I'm guessing your application runtime is dominated by computation that is invoked (indirectly) from the callsite where your hotspot is. |
The fix for #57 should resolve your second issue, once it's done. Could you submit the change in your gist as a pull request? |
Are the pre block/post block annotations needed for epoll too? Or is epoll understood properly and this only applies to condvar and futexes? I have a complicated multiprocess system (processes spawned via forks of the main process which is always idle and uninteresting) and trying to see if coz will be a good fit to figure out why some complicated RPC between some components is slow. |
My application uses futex to implement a spinlock. This is fairly common in high performance server code (eg gperftools uses futex for spinlocks, as do libraries like facebook's folly).
For better accuracy, libcoz should intercept the futex syscall and treat it similar to how condition variables are treated.
The text was updated successfully, but these errors were encountered: