Crash command: bt #555

brenns10 · 2025-10-09T23:33:54Z

Thanks for the review on the aarch64 unwinding things! That unblocks the bt command, which is implemented in this PR. No rush on this as you know.

The main functionality is exposed via two helpers:

kernel_stack_trace() returns a LinuxKernelStack(). This behaves very similarly to a StackTrace - it can be iterated and indexed, and printed with str(). However, it has additional "segments" that correspond to portions of the stack which are separated by interrupted frames. Each segment is associated with a stack "kind", which the helper can categorize.
print_registers() prints the registers dictionary in a way similar to crash. It isn't exactly the same, because crash actually is looking at the struct pt_regs, which frequently contains one or two more items that aren't part of the generic register set. (It seems like it would be possible to expose the struct pt_regs for an interrupted StackFrame. Is that something you'd like to see me include and work into this?)

For the actual command, I've tried to stay close to the crash command, without being too attached to things. I documented a lot of the choices I made in #538, so I won't exhaustively enumerate everything.

Some notable features that I've either included or omitted:

The -d and -V flags are new, and not present in crash. I am still on the fence about whether the -d flag is really worthwhile. I am very confident that -V will be popular.
The -r flag is not yet implemented. It seems lower priority but not hard to do.
There are several flags that seem low priority, but still useful - -v to check for stack overflow evidence, -e and -E to search for possible exception stack frames (I'm not sure how these are implemented though), and -t / -T to display any text symbols found on the task stack memory. Also, -R will only print the stack trace if it contains a given function, which seems useful for something like foreach bt -R dput, printing every stack currently executing dput. I think these are all great, but maybe worth putting into a separate "lower priority bt flags" task. Since at this point, there's actually a lot of crash commands implemented, and bt really should be there now.

brenns10 · 2025-10-22T06:49:15Z

Updated this to remove merge conflicts and reflect recent changes:

Use print_task_header() to... print the task header
I created a Taskspec object similar to the Cpuspec which simplified the command and code generation.

brenns10 · 2025-11-13T00:43:04Z

Rebased on main with the changes to support -f and -F based on _print_memory() from rd. vmtest is currently running on all architectures but it looked good up to 6.10.

Signed-off-by: Stephen Brennan <[email protected]>

This register format is roughly the same as is used by crash. It will be useful for the bt command, and can probably be more generally useful. Signed-off-by: Stephen Brennan <[email protected]>

Signed-off-by: Stephen Brennan <[email protected]>

Kernel stacks are quite complex. There can be many stacks nested from different CPU modes (NMI, IRQ, various exceptions, kernel threads, user threads, etc). Drgn is capable of unwinding through most of these transitions on x86_64, but even so, it doesn't provide information to the user about them. On other architectures, drgn doesn't necessarily have the ability to unwind through all the stacks, but that doesn't mean that we can't implement logic to find the next set of registers to continue unwinding. So, the stack helpers here will serve to enable kernel-specific code that (a) classifies the stack segments, and (b) allows architecture specific heuristics to detect when more stack segments exist, so we can get a complete stack even when the debuginfo doesn't allow us to unwind directly. Start with just the stack detection helpers for x86_64. Signed-off-by: Stephen Brennan <[email protected]>

Commands like bt need to provide options for specifying tasks: by a CPU list (similar to Cpuspec), by the current crash context, by explicitly providing pids or tasks, etc. Create a Taskspec object to represent this and reuse the Cpuspec code. Also add append_taskspec() to facilitate drgn code generation. Signed-off-by: Stephen Brennan <[email protected]>

Signed-off-by: Stephen Brennan <[email protected]>

The command prints stack segments, registers, and it can be called with a CPU, task, PID, or for all CPUs. Not all crash options are supported: most notably missing are the options which print (and optionally annotate) the stack memory. These should be implemented together with the "rd" command due to their similarity. We do add two drgn-specific options. First is a "-d" option to format stack frames as drgn would, rather than as crash does. This is a nice way for drgn users to feel more comfortable, while still gaining the benefits of the header, stack segmentation, register dumps, etc. Second is a "-V" option to print local variable values. This is an option that crash users would certainly want to have, if it were available! Including it in this command could be a great carrot for advertising drgn to existing crash users. Beyond these two options, there are a few known differences in the output: 1. The register dumps are missing registers which are part of the "pt_regs" object, but not part of the architecture registers -- a good example would be "ORIG_RAX" on x86_64. 2. The stack memory adresses (shown in brackets []) are offset by one word from what crash shows, because drgn reports the stack pointer prior to the function return address was pushed to the stack. 3. Drgn reports inline functions. While we could strictly emulate crash and omit them, the information is useful enough that it is worth breaking the compatibility. In place of the stack address, we report the text "(inline)" so these are easy to see. 4. Filenames reported by StackFrame.source() are not absolute paths, while those reported by crash are. Thus, the filenames included in "bt -l" are much shorter in drgn. Closes osandov#538. Signed-off-by: Stephen Brennan <[email protected]>

brenns10 · 2025-11-20T22:47:22Z

This is once again rebased. In terms of updates:

Added tests for the -f, -F, and -FF arguments.
Fixed a very frustrating issue with s390x stack pointers being 160 bytes offset from the relevant stack data.
Added the missing [module name] annotations for kernel module function names.

I did need to resolve some minor conflicts, especially in the cli.py file where we check return types to see whether to format them with str or repr. It occurs to me that my DisplayStr idea may not be the best way to implement this (e.g. it cannot be used with NamedTuples, which are really useful return types).

Do you think it would make sense to have the CLI check hasattr(value, "_repr_pretty_") and then use that to format return values, rather than checking against a hard-coded list of types, or my mix-in DisplayStr class? I know it's a Jupyter/IPython thing so there are already non-drgn types out there which implement it. But I think that there may be value in making it easier for non-drgn code to declare a pretty-printer that works in the CLI.

Edit: and this had a fully clean test run on all supported architectures. It is ready for review.

osandov

I didn't get to reviewing the bulk of this before I head out for Thanksgiving, so I'll just share my comments on the first three commits. Feel free to wait until I look at the rest or address these now.

Re: DisplayStr, Using _repr_pretty_ would require us to emulate IPython's pretty-printer object. A class decorator that adds to some global tuple internal to drgn would work for NamedTuples:

@drgn.cli.display_str
class MyClass:
    ...

Would that be better? You could also just call the decorator (drgn.cli.display_str(YourClass)), meaning you could register types that you don't necessarily have control over. The implementation would be slower, but we're talking about printing to a terminal at that point anyways.

osandov · 2025-11-24T18:21:17Z

drgn/helpers/common/stack.py

+    Print a CPU register dump, in a format similar to that of crash
+
+    :param regs: a dictionary of registers, named in a similar way to the
+      dictionary returned by :py:class:`drgn.StackTrace.registers`.


Since this references a method, I think it could be :meth:, not :class::

Suggested change

dictionary returned by :py:class:`drgn.StackTrace.registers`.

dictionary returned by :meth:`drgn.StackTrace.registers()`.

osandov · 2025-11-24T18:22:08Z

drgn/helpers/common/stack.py

+
+    :param regs: a dictionary of registers, named in a similar way to the
+      dictionary returned by :py:class:`drgn.StackTrace.registers`.
+    :param indent: the number of spaces to indent the output


Can we take the indentation string, similar to textwrap.indent(), rather than a number?

osandov · 2025-11-24T18:23:35Z

drgn/helpers/common/stack.py

+
+def print_registers(prog: Program, regs: Dict[str, int], indent: int = 4) -> None:
+    """
+    Print a CPU register dump, in a format similar to that of crash


Suggested change

Print a CPU register dump, in a format similar to that of crash

Print a CPU register dump, in a format similar to that of :manpage:`crash(8)`.

osandov · 2025-11-26T00:15:40Z

drgn/helpers/common/stack.py

        start = end
+
+
+def print_registers(prog: Program, regs: Dict[str, int], indent: int = 4) -> None:


For CLI convenience:

Suggested change

def print_registers(prog: Program, regs: Dict[str, int], indent: int = 4) -> None:

@takes_program_or_default

def print_registers(prog: Program, regs: Dict[str, int], indent: int = 4) -> None:

brenns10 · 2025-12-01T19:31:02Z

Re: DisplayStr ... Would that be better?

Yeah, a decorator would work for the external use case. I like that it would work with namedtuples too! And for internal types, of course we can continue to "hard-code" it as we do now.

osandov

It's awesome to see this coming together. I gave the helpers the most scrutiny, plus a few more comments throughout.

osandov · 2025-12-02T18:11:58Z

drgn/helpers/linux/stack.py

+                    source_info = " (%s:%d:%d)" % frame.source()
+                except LookupError:
+                    pass
+                lines.append(f"#{i:{framew}d} {frame.name}{source_info}")


The "d" format code is right-justified by default, but it would be nice to be consistent with StackTrace.__str__(), which is left-justified.

osandov · 2025-12-02T18:13:35Z

drgn/helpers/linux/stack.py

+
+    def __str__(self) -> str:
+        total_frames = sum(len(s.frames) for s in self.segments)
+        framew = 1 if total_frames < 10 else 2


I think this is off by one, since total_frames == 10 would be 0..9. FWIW, StackTrace.__str__() always uses a width of 2, even if the trace has less than 10 frames, so maybe you can just use 2 regardless.

osandov · 2025-12-02T18:16:02Z

drgn/helpers/linux/stack.py

+    on_cpu: bool
+    """Whether the task is currently on CPU"""
+
+    segments: List[StackSegment]


I'd prefer a weaker contract of what type this is and use Sequence. That will likely require refactoring the code that constructs this to build a temporary list and then create the LinuxKernelStack at the end (rather than creating the LinuxKernelStack with an empty list and appending to it).

osandov · 2025-12-02T18:18:05Z

drgn/helpers/linux/stack.py

+    kind: StackKind
+    """The kind of stack associated with this segment"""
+
+    frames: List[StackFrame]


Same here re: List -> Sequence.

osandov · 2025-12-02T18:18:21Z

drgn/helpers/linux/stack.py

+    """Stack frames that are part of the segment"""
+
+
+class LinuxKernelStack(DisplayStr):


This needs a class docstring.

osandov · 2025-12-02T18:41:34Z

drgn/helpers/linux/stack.py

+        return "\n".join(lines)
+
+    def __iter__(self) -> Iterator[StackFrame]:
+        return chain.from_iterable(map(iter, self.segments))  # type: ignore


I don't think this works as intended (and the # type: ignore is masking that). iter(StackSegment) yields the kind and then the frames list. I think you want something like:

Suggested change

return chain.from_iterable(map(iter, self.segments)) # type: ignore

return chain.from_iterable(seg.frames for seg in self.segments)

Alternatively, if you want iter(StackSegment) to iterate over the frames, then it will need a custom __iter__, at which point it probably can't be a NamedTuple.

osandov · 2025-12-02T18:44:37Z

drgn/helpers/linux/stack.py

+    task: Object
+    """The task associated with the stack trace"""
+
+    cpu: int
+    """CPU the task currently, or most recently, executed on"""
+
+    on_cpu: bool
+    """Whether the task is currently on CPU"""


Does the bt command need these? If not, how strongly do you want them? As usual, I'd like to keep the API surface as minimal as possible and omit them if there's not a specific use case in mind.

osandov · 2025-12-02T18:50:04Z

drgn/helpers/linux/stack.py

+    return StackSegment(kind, frames)
+
+
+def kernel_stack_trace(task: Object) -> LinuxKernelStack:


Please support passing a PID, too. You can use the logic in Program_stack_trace() and drgn_object_stack_trace() as inspiration (it's hairy).

osandov · 2025-12-02T18:55:33Z

drgn/commands/crash.py

+            self.add_from_import("drgn.helpers.linux.sched", "task_cpu")
+            self.append(
+                """\
+task = prog.crashed_thread().object


Can this use self._append_crash_panic_context()?

osandov · 2025-12-02T19:00:10Z

drgn/commands/_builtin/crash/_bt.py

+        # Python 3.9 does not allow this to be in a mutually exclusive argument
+        # group (ValueError: mutually exclusive arguments must be optional).
+        # This is despite the fact that using nargs="*" means it IS optional.


You can work around this by adding default=[]:

drgn/drgn/commands/_builtin/crash/_sys.py

Lines 607 to 609 in a0dc7d7

# Work around https://github.com/python/cpython/issues/72795

# before Python 3.13.

default=[],

brenns10 force-pushed the crash_bt branch from 1a39b42 to 4d05106 Compare October 10, 2025 18:04

brenns10 force-pushed the crash_bt branch 2 times, most recently from 90f240b to ef6046f Compare October 22, 2025 06:48

brenns10 force-pushed the crash_bt branch from ef6046f to dc82021 Compare November 13, 2025 00:40

brenns10 added 7 commits November 20, 2025 11:32

tests: linux_kernel: panic within IRQ on arm & ppc64

34913da

Signed-off-by: Stephen Brennan <[email protected]>

drgn.helpers.common: add print_registers() helper

c4e0ce4

This register format is roughly the same as is used by crash. It will be useful for the bt command, and can probably be more generally useful. Signed-off-by: Stephen Brennan <[email protected]>

cli: Add a mixin class to display types with str()

70fd5f1

Signed-off-by: Stephen Brennan <[email protected]>

rd: add indent to _print_memory to support bt

3a384f5

Signed-off-by: Stephen Brennan <[email protected]>

brenns10 force-pushed the crash_bt branch from dc82021 to 0f98b72 Compare November 20, 2025 22:39

osandov reviewed Nov 26, 2025

View reviewed changes

osandov requested changes Dec 2, 2025

View reviewed changes

	dictionary returned by :py:class:`drgn.StackTrace.registers`.
	dictionary returned by :meth:`drgn.StackTrace.registers()`.

	Print a CPU register dump, in a format similar to that of crash
	Print a CPU register dump, in a format similar to that of :manpage:`crash(8)`.

		start = end


		def print_registers(prog: Program, regs: Dict[str, int], indent: int = 4) -> None:

	def print_registers(prog: Program, regs: Dict[str, int], indent: int = 4) -> None:
	@takes_program_or_default
	def print_registers(prog: Program, regs: Dict[str, int], indent: int = 4) -> None:

		"""Stack frames that are part of the segment"""


		class LinuxKernelStack(DisplayStr):

	return chain.from_iterable(map(iter, self.segments)) # type: ignore
	return chain.from_iterable(seg.frames for seg in self.segments)

		return StackSegment(kind, frames)


		def kernel_stack_trace(task: Object) -> LinuxKernelStack:

	# Work around https://github.com/python/cpython/issues/72795
	# before Python 3.13.
	default=[],

Crash command: bt #555

Are you sure you want to change the base?

Crash command: bt #555

Conversation

brenns10 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brenns10 commented Oct 22, 2025

Uh oh!

brenns10 commented Nov 13, 2025

Uh oh!

brenns10 commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

osandov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brenns10 commented Dec 1, 2025

Uh oh!

osandov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brenns10 commented Oct 9, 2025 •

edited

Loading

brenns10 commented Nov 20, 2025 •

edited

Loading