Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preemption Safety Fixes and Miscellaneous Improvements #76

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

ryao
Copy link
Contributor

@ryao ryao commented Mar 10, 2014

The included patches dramatically improve stability on my system when using the fbt provider to analyze ZFSOnLinux. So far, I have found no regressions. A few of these patches are a little rough, but I am opening this pull request for feedback.

ryao added 6 commits March 3, 2014 16:23
The changes to this file were never upstreamed and caused the compiler
to complain about type issues. Any changes made here should be
upstreamed.

Signed-off-by: Richard Yao <[email protected]>
We define strlcat ourselves, so we should include its function prototype
in strings.h

Note that we need a way to handle sane libcs like uclibc that provide
strlcat.

Signed-off-by: Richard Yao <[email protected]>
This does *NOT* play nicely with preemptible kernels. Since we are
running as root, this doesn't matter very much right now. Lets disable
it until it can be reworked.

Signed-off-by: Richard Yao <[email protected]>
ryao added 6 commits March 10, 2014 16:13
The use of custom code makes debugging difficult and there does not
appear any reason for it given that the native Linux functions work on
ARM, so lets get rid of it.

This resolves potential issues where Linux's memory allocation functions
fail to warn that we are in an interrupt context. Memory allocations in
such contexts are unsafe and do not happen on any other DTrace platform,
so it is a very good thing to ensure that we are warned when this
happens.

Signed-off-by: Richard Yao <[email protected]>
The custom locking infrastructure in dtrace4linux was designed to
operate in an interrupt context and consequently, many locks seem to have
been modified to incorporate a change in locking semantics where their
use creates an interrupt context on entering a lock and leaves it on
exiting a lock. This has multiple problems.

1. The new locking semantics clobber the interrupt masks whenever
locking fails to be done in reverse order. For instance,
dtrace_module_unloaded() locks variables in dtrace4linux this way:

dmutex_enter(&dtrace_provider_lock);
dmutex_enter(&mod_lock);
dmutex_enter(&dtrace_lock);

It then unlocks them this way:

dmutex_exit(&dtrace_provider_lock);
dmutex_exit(&mod_lock);
dmutex_exit(&dtrace_lock);

Since we have multiple locks with the new semantics,
dtrace_provider_lock will be the one that stores the original interrupt
mask while the others store interrupt masks that have interrupts
disabled. Since unlocking is not done in reverse, we exit this function
with interrupts disabled and consequently, bad things can happen. This
is not the only example of this problem and while the lock ordering
should be fixed, it does explain some problems specific to dtrace4linux
that do not occur on other dtrace ports.

2. The mutexes were replaced with semaphores, but we have no stack on
which to store multiple interrupt contexts, so they get clobbered then.

3. Other DTrace platforms, such as Illumos, expect these locks to have
mutex semantics that are unsafe inside interrupt contexts. There is no
apparent benefit to being different, but plenty of issues stem from
being different. The change in semantics risks of plenty of things that
should work in routines known to work fine on other platforms can and
will fail in non-obvious ways.

Signed-off-by: Richard Yao <[email protected]>
Comments suggest that the rationale for using custom code for
dtrace_xcall() was that it on_each_cpu did not interact well with the
custom locking. Since that code needs to be replaced with more
conventional Linux locks, the rationale for a custom xcall
implementation is gone. Furthermore, it is not clear that this interacts
well with on_each_cpu(), so lets drop it in favor of wrapping
on_each_cpu(). This avoids having to worry about strange interactions.

Signed-off-by: Richard Yao <[email protected]>
The locking and xcall changes make this unnecessary. It triggers a
warning on preemptible kernels, so lets get rid of it:

[53134.451543] BUG: using smp_processor_id() in preemptible [00000000] code: dtrace/19160
[53134.451559] caller is dtrace_ecb_destroy+0x523/0x6d0 [dtracedrv]
[53134.451563] CPU: 3 PID: 19160 Comm: dtrace Tainted: P           O 3.12.6 dtrace4linux#3
[53134.451564] Hardware name: Supermicro X9SRA/X9SRA-3/X9SRA/X9SRA-3, BIOS 3.0a 08/08/2013
[53134.451566]  0000000000000000 ffff880ec889bdb0 ffffffff813ce370 0000000000000003
[53134.451572]  ffff880ec889bdc8 ffffffff8120ce93 ffff880ed1daff60 ffff880ec889be20
[53134.451576]  ffffffffa1156b83 00000001c889bde8 ffffffffa117808d ffff880ec889bdf8
[53134.451579] Call Trace:
[53134.451586]  [<ffffffff813ce370>] dump_stack+0x4e/0x82
[53134.451592]  [<ffffffff8120ce93>] debug_smp_processor_id+0xd3/0xf0
[53134.451601]  [<ffffffffa1156b83>] dtrace_ecb_destroy+0x523/0x6d0 [dtracedrv]
[53134.451616]  [<ffffffffa117808d>] ? dtrace_xcall+0x2d/0x30 [dtracedrv]
[53134.451626]  [<ffffffffa115c8c7>] dtrace_state_destroy+0x167/0x860 [dtracedrv]
[53134.451636]  [<ffffffffa115e16c>] dtrace_close+0x8c/0x150 [dtracedrv]
[53134.451645]  [<ffffffffa11637f2>] dtracedrv_release+0x12/0x20 [dtracedrv]
[53134.451652]  [<ffffffff8114631a>] __fput+0xea/0x2c0
[53134.451655]  [<ffffffff81146529>] ____fput+0x9/0x10
[53134.451660]  [<ffffffff81063c8c>] task_work_run+0x9c/0xd0
[53134.451665]  [<ffffffff81002a0a>] do_notify_resume+0x8a/0xa0
[53134.451669]  [<ffffffff813dafea>] int_signal+0x12/0x17

Signed-off-by: Richard Yao <[email protected]>
It is unsafe to use mutex functions inside an interrupt context.

Signed-off-by: Richard Yao <[email protected]>
This creates a flood of warnings to dmesg on systems with preemptible
kernels. These warnings are not of huge importance, so lets disable them
until a different mechanism is available for our use. I believe that
doing this is safe in >99.9% of cases.

Signed-off-by: Richard Yao <[email protected]>
@dtrace4linux
Copy link
Owner

Hello Richard,

can you give me more detail on what you are doing here? Eg using native
linux features for dtrace_xcall prevents you from tracing the kernel cpu
calling functions. And you seem to have switched from dmutex to mutex - we
have to avoid relying on any kernel implementation else we can deadlock.

thanks

On 10 March 2014 19:16, Richard Yao [email protected] wrote:

The included patches dramatically improve stability on my system when
using the fbt provider and so far, I have found no regressions. Testing is
primarily done using fbt to analyze ZFSOnLinux. I am opening this pull

request for feedback.

You can merge this Pull Request by running

git pull https://github.com/ryao/dtrace refactoring

Or view, comment on, or merge it at:

#76
Commit Summary

  • Properly initialize par_mutex
  • Revert dt_grammar.y to Illumos version
  • Add strlcat to strings.h
  • Disable preemption when printing statements
  • Disable credential infrastructure
  • Disable preemption during tsc_max_delta calculation
  • Use Linux interrupt management functions
  • Use Linux Locking Infrastructure
  • Use Linux infrastructure for dtrace_xcall()
  • Remove lock_teardown
  • HACK: Disable preemption when getting time stamps
  • Convert dtrace_stack_mutex into a spin-lock

File Changes

Patch Links:

Reply to this email directly or view it on GitHubhttps://github.com//pull/76
.

@ryao
Copy link
Contributor Author

ryao commented Mar 10, 2014

@dtrace4linux If you look carefully, you will see that dmutex causes plenty of deadlocks. The interrupt flags are being clobbered and the code is not designed to work under such semantics. As for dtrace_xcall, it was a large amount of code and I did not see much reason for it. It is not clear to me that we gain much by being able to trace those functions, but it is clear to me that reasoning about correctness in this custom code is hard. I compared dtrace4linux with other ports and I did not see these things being done.

That being said, you do raise the valid point that dtrace4linux probably should be revised to stop permitting that to be traced via fbt as part of this. The same goes for revising it to stop permitting tracing of spinlocks (and possibly also mutexes) via fbt in favor of something more like lockstat.

The main purpose of this pull request was feedback. I am not aware of any significant regressions in comparison to what dtrace does on Illumos, but the improvements in terms of making capabilities safe on Linux like on Illumos are rather substantial in my testing. I think there is definite merit to these changes, although they do need more review.

Lastly, I bundled some less extreme fixes into this. It was not clear to me breaking this into separate pull requests had much value when I wanted review. Anyway, there are some more clear cut fixes like "Convert dtrace_stack_mutex into a spin-lock", "Add strlcat to strings.h" and the patches to disable kernel preemption in certain key regions. I made an effort to keep things broken into logically separate commits so it would not be a giant monolithic patch. Feel free to cherry-pick the safer things separately so we can whittle this down. :)

@dtrace4linux
Copy link
Owner

Thanks Richard. Your comments are inciteful.

Firstly, if dmutex is molesting the interrupt flags - thats very bad and
may explain some of the issues. Just reviewing code - there is
disable_ints, which is never set to true; i had added that for debugging -
so that code can come out.. I dont see an obvious other area where the ints
are molested (xcall_slave should not be touching the ints).

dtrace_xcall is necessarily "complex" - its based on the solaris code of
the same name, but implementationally different, because we are not running
in a solaris kernel, due to lack of splX support and the deep intimate
knowledge solaris is imparting into dtrace.

Yes - you can simplify the code vastly if you accept the tradeoffs, but
trying to use the smp_call_function kernel functions will deadlock under
extreme situations. The scenario is to be able to do 'dtrace -n fbt::::'
for extended periods of time without any deadlock or crash. (This was soak
tested for >24h on various kernels). Unfortunately the desire to support
the multitude of kernel versions means the code may be brittle, and I have
an issue with a Xen guest which proves the brittleness.

In your case, you have no need for this - so your changes are good and work
well for you, and/or you are not hitting the calls in a way which shows any
issue, so am interested in stats you have on hit rates (see
/proc/dtrace/stats for some of this).

The big issue is "reachability" - I have had to manually walk the
reachability of functions (and use toxic.c to blacklist, possibly
unnecessarily) some functions to avoid recursive dtrace faults.

I checked the Apple/FreeBSD ports - and the FreeBSD port (at least in the
9.x kernels) didnt prove conclusively how reliable the code was. Since
nobody complained, I assume the kernel infra is a good fit for dtrace -
better than Linux. On Darwin, there appears to be more of a reliance on the
mach kernel to "do the right thing".

I'll re-review your changes and certainly accept some of the utility
changes (eg the stack trace spin lock), but am loathe to blindly accept the
remaining changes as they would appear to undo most of the stability
mechanisms for other kernels or use cases. The other option is to come up
with a better mechanism for detecting recursive probes (the driver detects
those and auto-blacklists these, assuming we are not hitting the critical
interrupt paths), and see what can be done. it would be great to have a
100% reliable driver for the 99.9% of use cases, and leave the stability
issues for a power user or dtrace developer.

On 10 March 2014 21:25, Richard Yao [email protected] wrote:

@dtrace4linux https://github.com/dtrace4linux If you look carefully,
you will see that dmutex causes plenty of deadlocks. The interrupt flags
are being clobbered. As for dtrace_xcall, it was a large amount of code and
I did not see much reason for it. It is not clear to me that we gain much
by being able to trace those, but it is clear to me that reasoning about
correctness in this custom code is hard. I compared dtrace4linux with other
ports and I did not see these things being done.

I bundled some less extreme fixes into this. It was not clear to me
breaking this into separate pull requests had much value when I wanted
review. Anyway, there are some more clear cut fixes like "Convert
dtrace_stack_mutex into a spin-lock", "Add strlcat to strings.h" and the
patches to disable kernel preemption in certain key regions.

Reply to this email directly or view it on GitHubhttps://github.com//pull/76#issuecomment-37236423
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants