-
Notifications
You must be signed in to change notification settings - Fork 199
Crash command: bt #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Crash command: bt #555
Conversation
90f240b to
ef6046f
Compare
|
Updated this to remove merge conflicts and reflect recent changes:
|
|
Rebased on main with the changes to support |
Signed-off-by: Stephen Brennan <[email protected]>
This register format is roughly the same as is used by crash. It will be useful for the bt command, and can probably be more generally useful. Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
Kernel stacks are quite complex. There can be many stacks nested from different CPU modes (NMI, IRQ, various exceptions, kernel threads, user threads, etc). Drgn is capable of unwinding through most of these transitions on x86_64, but even so, it doesn't provide information to the user about them. On other architectures, drgn doesn't necessarily have the ability to unwind through all the stacks, but that doesn't mean that we can't implement logic to find the next set of registers to continue unwinding. So, the stack helpers here will serve to enable kernel-specific code that (a) classifies the stack segments, and (b) allows architecture specific heuristics to detect when more stack segments exist, so we can get a complete stack even when the debuginfo doesn't allow us to unwind directly. Start with just the stack detection helpers for x86_64. Signed-off-by: Stephen Brennan <[email protected]>
Commands like bt need to provide options for specifying tasks: by a CPU list (similar to Cpuspec), by the current crash context, by explicitly providing pids or tasks, etc. Create a Taskspec object to represent this and reuse the Cpuspec code. Also add append_taskspec() to facilitate drgn code generation. Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
The command prints stack segments, registers, and it can be called with a CPU, task, PID, or for all CPUs. Not all crash options are supported: most notably missing are the options which print (and optionally annotate) the stack memory. These should be implemented together with the "rd" command due to their similarity. We do add two drgn-specific options. First is a "-d" option to format stack frames as drgn would, rather than as crash does. This is a nice way for drgn users to feel more comfortable, while still gaining the benefits of the header, stack segmentation, register dumps, etc. Second is a "-V" option to print local variable values. This is an option that crash users would certainly want to have, if it were available! Including it in this command could be a great carrot for advertising drgn to existing crash users. Beyond these two options, there are a few known differences in the output: 1. The register dumps are missing registers which are part of the "pt_regs" object, but not part of the architecture registers -- a good example would be "ORIG_RAX" on x86_64. 2. The stack memory adresses (shown in brackets []) are offset by one word from what crash shows, because drgn reports the stack pointer prior to the function return address was pushed to the stack. 3. Drgn reports inline functions. While we could strictly emulate crash and omit them, the information is useful enough that it is worth breaking the compatibility. In place of the stack address, we report the text "(inline)" so these are easy to see. 4. Filenames reported by StackFrame.source() are not absolute paths, while those reported by crash are. Thus, the filenames included in "bt -l" are much shorter in drgn. Closes osandov#538. Signed-off-by: Stephen Brennan <[email protected]>
|
This is once again rebased. In terms of updates:
I did need to resolve some minor conflicts, especially in the Do you think it would make sense to have the CLI check Edit: and this had a fully clean test run on all supported architectures. It is ready for review. |
osandov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get to reviewing the bulk of this before I head out for Thanksgiving, so I'll just share my comments on the first three commits. Feel free to wait until I look at the rest or address these now.
Re: DisplayStr, Using _repr_pretty_ would require us to emulate IPython's pretty-printer object. A class decorator that adds to some global tuple internal to drgn would work for NamedTuples:
@drgn.cli.display_str
class MyClass:
...Would that be better? You could also just call the decorator (drgn.cli.display_str(YourClass)), meaning you could register types that you don't necessarily have control over. The implementation would be slower, but we're talking about printing to a terminal at that point anyways.
| Print a CPU register dump, in a format similar to that of crash | ||
| :param regs: a dictionary of registers, named in a similar way to the | ||
| dictionary returned by :py:class:`drgn.StackTrace.registers`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this references a method, I think it could be :meth:, not :class::
| dictionary returned by :py:class:`drgn.StackTrace.registers`. | |
| dictionary returned by :meth:`drgn.StackTrace.registers()`. |
| :param regs: a dictionary of registers, named in a similar way to the | ||
| dictionary returned by :py:class:`drgn.StackTrace.registers`. | ||
| :param indent: the number of spaces to indent the output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we take the indentation string, similar to textwrap.indent(), rather than a number?
|
|
||
| def print_registers(prog: Program, regs: Dict[str, int], indent: int = 4) -> None: | ||
| """ | ||
| Print a CPU register dump, in a format similar to that of crash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Print a CPU register dump, in a format similar to that of crash | |
| Print a CPU register dump, in a format similar to that of :manpage:`crash(8)`. |
| start = end | ||
|
|
||
|
|
||
| def print_registers(prog: Program, regs: Dict[str, int], indent: int = 4) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For CLI convenience:
| def print_registers(prog: Program, regs: Dict[str, int], indent: int = 4) -> None: | |
| @takes_program_or_default | |
| def print_registers(prog: Program, regs: Dict[str, int], indent: int = 4) -> None: |
Yeah, a decorator would work for the external use case. I like that it would work with namedtuples too! And for internal types, of course we can continue to "hard-code" it as we do now. |
osandov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's awesome to see this coming together. I gave the helpers the most scrutiny, plus a few more comments throughout.
| source_info = " (%s:%d:%d)" % frame.source() | ||
| except LookupError: | ||
| pass | ||
| lines.append(f"#{i:{framew}d} {frame.name}{source_info}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "d" format code is right-justified by default, but it would be nice to be consistent with StackTrace.__str__(), which is left-justified.
|
|
||
| def __str__(self) -> str: | ||
| total_frames = sum(len(s.frames) for s in self.segments) | ||
| framew = 1 if total_frames < 10 else 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is off by one, since total_frames == 10 would be 0..9. FWIW, StackTrace.__str__() always uses a width of 2, even if the trace has less than 10 frames, so maybe you can just use 2 regardless.
| on_cpu: bool | ||
| """Whether the task is currently on CPU""" | ||
|
|
||
| segments: List[StackSegment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a weaker contract of what type this is and use Sequence. That will likely require refactoring the code that constructs this to build a temporary list and then create the LinuxKernelStack at the end (rather than creating the LinuxKernelStack with an empty list and appending to it).
| kind: StackKind | ||
| """The kind of stack associated with this segment""" | ||
|
|
||
| frames: List[StackFrame] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here re: List -> Sequence.
| """Stack frames that are part of the segment""" | ||
|
|
||
|
|
||
| class LinuxKernelStack(DisplayStr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a class docstring.
| return "\n".join(lines) | ||
|
|
||
| def __iter__(self) -> Iterator[StackFrame]: | ||
| return chain.from_iterable(map(iter, self.segments)) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this works as intended (and the # type: ignore is masking that). iter(StackSegment) yields the kind and then the frames list. I think you want something like:
| return chain.from_iterable(map(iter, self.segments)) # type: ignore | |
| return chain.from_iterable(seg.frames for seg in self.segments) |
Alternatively, if you want iter(StackSegment) to iterate over the frames, then it will need a custom __iter__, at which point it probably can't be a NamedTuple.
| task: Object | ||
| """The task associated with the stack trace""" | ||
|
|
||
| cpu: int | ||
| """CPU the task currently, or most recently, executed on""" | ||
|
|
||
| on_cpu: bool | ||
| """Whether the task is currently on CPU""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the bt command need these? If not, how strongly do you want them? As usual, I'd like to keep the API surface as minimal as possible and omit them if there's not a specific use case in mind.
| return StackSegment(kind, frames) | ||
|
|
||
|
|
||
| def kernel_stack_trace(task: Object) -> LinuxKernelStack: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please support passing a PID, too. You can use the logic in Program_stack_trace() and drgn_object_stack_trace() as inspiration (it's hairy).
| self.add_from_import("drgn.helpers.linux.sched", "task_cpu") | ||
| self.append( | ||
| """\ | ||
| task = prog.crashed_thread().object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this use self._append_crash_panic_context()?
| # Python 3.9 does not allow this to be in a mutually exclusive argument | ||
| # group (ValueError: mutually exclusive arguments must be optional). | ||
| # This is despite the fact that using nargs="*" means it IS optional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can work around this by adding default=[]:
drgn/drgn/commands/_builtin/crash/_sys.py
Lines 607 to 609 in a0dc7d7
| # Work around https://github.com/python/cpython/issues/72795 | |
| # before Python 3.13. | |
| default=[], |
Thanks for the review on the aarch64 unwinding things! That unblocks the bt command, which is implemented in this PR. No rush on this as you know.
The main functionality is exposed via two helpers:
kernel_stack_trace()returns aLinuxKernelStack(). This behaves very similarly to aStackTrace- it can be iterated and indexed, and printed withstr(). However, it has additional "segments" that correspond to portions of the stack which are separated by interrupted frames. Each segment is associated with a stack "kind", which the helper can categorize.print_registers()prints the registers dictionary in a way similar to crash. It isn't exactly the same, because crash actually is looking at thestruct pt_regs, which frequently contains one or two more items that aren't part of the generic register set. (It seems like it would be possible to expose thestruct pt_regsfor an interruptedStackFrame. Is that something you'd like to see me include and work into this?)For the actual command, I've tried to stay close to the crash command, without being too attached to things. I documented a lot of the choices I made in #538, so I won't exhaustively enumerate everything.
Some notable features that I've either included or omitted:
-dand-Vflags are new, and not present in crash. I am still on the fence about whether the-dflag is really worthwhile. I am very confident that-Vwill be popular.-rflag is not yet implemented. It seems lower priority but not hard to do.-vto check for stack overflow evidence,-eand-Eto search for possible exception stack frames (I'm not sure how these are implemented though), and-t / -Tto display any text symbols found on the task stack memory. Also,-Rwill only print the stack trace if it contains a given function, which seems useful for something likeforeach bt -R dput, printing every stack currently executingdput. I think these are all great, but maybe worth putting into a separate "lower priority bt flags" task. Since at this point, there's actually a lot of crash commands implemented, andbtreally should be there now.