Skip to content

Conversation

@richl9
Copy link
Contributor

@richl9 richl9 commented Aug 6, 2025

Pushed a new branch for the runqueue helper with the earlier feedback applied. Just wanted to get a quick review to see if any structural changes are needed before I continue:) One thing is that maybe we want to move parse_range_list to somewhere else?

Copy link
Owner

@osandov osandov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I tried to clear up what @brenns10 and I had in mind and had a few more comments here and there. Let me know if that still doesn't make sense.

Also, I think you meant to say "command" instead of "helper" in your commit and PR titles.

Comment on lines 335 to 346
def parse_cpus_arg(cpus_arg: str, max_cpu: int) -> Set[int]:
"""
Parse argument to -c for cpu restriction (e.g. '1,3,5-8').
:param cpus_arg: str
:param max_cpu: int
:returns: a set of specified cpus
"""
cpus = parse_range_list(cpus_arg)
cpus = {c for c in cpus if c < max_cpu}

return cpus
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since your last PR, I added drgn.commands.crash.parse_cpuspec(). You can use that instead of adding this function. See drgn/commands/_builtin/crash/_eval.py for an example.

Comment on lines 349 to 341
def timestamp_str(ns: int) -> str:
"""Convert nanoseconds to 'days HH:MM:SS.mmm' string."""
ms_total = ns // 1000000
secs_total, ms = divmod(ms_total, 1000)
mins_total, secs = divmod(secs_total, 60)
hours_total, mins = divmod(mins_total, 60)
days, hours = divmod(hours_total, 24)

return f"{days} {hours:02}:{mins:02}:{secs:02}.{ms:03}"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This format seems specific to the runq command from what I can tell, so it can just go in drgn/commands/_builtin/crash/runq.py.

"""
# Try common fields in order; not all will exist on all kernels
rq_ts = 0
for name in ("clock", "most_recent_timestamp", "timestamp_last_tick"):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, rq->clock was added all the way back in torvalds/linux@6aa645e in Linux 2.6.23. What are the other timestamps needed for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just cleaned them.

return rq_ts


def get_rt_runq(runqueue: Object) -> Dict[str, Any]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drgn helpers tend to be generators rather than returning large data structures. E.g., a helper that yields a struct task_struct * object for each realtime task in a runqueue (or whatever variation of that is needed for the command) would be more appropriate.

return {"prio_array": prio_array, "tasks": results}


def get_cfs_runq(runqueue: Object, task_group: bool = False) -> Dict[str, Any]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, some sort of generator would be preferred.

return {"cfs_root": cfs_root, "tasks": results}


def run_queue(prog: Program, args: argparse.Namespace) -> List[Dict[str, Any]]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make much sense as a helper. It's basically the implementation of _crash_cmd_runq just modified to spit out a dictionary.

It's expected that a complex command like runq will have complex logic for iterating over whatever is needed and printing it, so that part can go back inside of _crash_cmd_runq. The point of @brenns10's comment on the previous PR was that any non-trivial inspection of the internals of a subsystem should go in small, self-contained helper functions. For example, it's a great idea to add helpers for iterating over tasks in a runqueue, and it's totally fine to call those directly from the command function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I kept get_rq_per_cpu helper though just to get per cpu runqueue since I think it may be qualified as a "kernel internal inspection". Let me know what do you think.



def parse_range_list(s):
def parse_range_list(s) -> Set[int]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop this since it won't be needed anymore once you use parse_cpuspec() instead.

@richl9 richl9 force-pushed the richard/crash/runq branch 3 times, most recently from 350eafa to 1e42d58 Compare August 7, 2025 18:50
@richl9 richl9 changed the title drgn: add runq helper drgn: add runq command Aug 7, 2025
@richl9
Copy link
Contributor Author

richl9 commented Aug 7, 2025

HI @osandov , thanks for the comments. Anymore big change suggestions before I wrap up the details?

Copy link
Owner

@osandov osandov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this structure is much better. I commented a couple more tweaks for the structure and some more comments on the helpers themselves.

return f"{days} {hours:02}:{mins:02}:{secs:02}.{ms:03}"


def dump_run_queue(prog: Program, args: argparse.Namespace) -> None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this to be a separate function, just put it straight in _crash_cmd_runq.

from drgn.helpers.linux.percpu import per_cpu


def get_rt_runq_tasks(runqueue: Object) -> Iterator[Object]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshedding naming, the kernel doesn't seem to use the term runq very much. It's either rq or runqueue. Iterator helpers also tend to be named with for_each, although it's not a strict rule.

How about rq_for_each_rt_task(), runqueue_rt_tasks(), or some variation of those?

"""
Get RT runqueue tasks in rt scheduler.
:param runqueue: Object
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say what type is expected:

Suggested change
:param runqueue: Object
:param runqueue: ``struct rq *``


def get_rt_runq_tasks(runqueue: Object) -> Iterator[Object]:
"""
Get RT runqueue tasks in rt scheduler.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd spell out "real-time" in the docstring here for clarity.

Get RT runqueue tasks in rt scheduler.
:param runqueue: Object
:return: Iterator of ``struct task_struct``
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually yields pointers, right?

Suggested change
:return: Iterator of ``struct task_struct``
:return: Iterator of ``struct task_struct *``

Copy link
Owner

@osandov osandov Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be updated to "Iterator of struct task_struct * objects."

Comment on lines 39 to 40
:param runqueue: Object
:return: Iterator of (``struct task_struct``, int) tuples
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about correcting the types used here.

:param runqueue: Object
:return: Iterator of (``struct task_struct``, int) tuples
"""
runq = runqueue.address_of_()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary, just use runqueue directly.

Comment on lines 46 to 47
if t == runqueue.curr:
continue
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check necessary? It seems more appropriate for the caller to ignore the current task if that's what it wants. If you get rid of the check, then this whole function body can be replaced by:

    return list_for_each_entry(
        "struct task_struct", runqueue.cfs_tasks.address_of_(), "se.group_node"
    )

yield container_of(t, "struct task_struct", "rt")


def get_rq_per_cpu(prog: Program, cpus: List[int] = []) -> Iterator[Tuple[int, Object]]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this one should go in drgn/commands/_builtin/crash/runq.py instead of as a public helper.

):
if t == runqueue.curr:
continue
yield container_of(t, "struct task_struct", "rt")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This container_of() is for real-time, which doesn't seem right for this function.

@richl9 richl9 force-pushed the richard/crash/runq branch from 1e42d58 to 6e6338c Compare October 22, 2025 22:09
Signed-off-by: Richard Li <[email protected]>
@richl9 richl9 force-pushed the richard/crash/runq branch from 6e6338c to b04e9c6 Compare October 22, 2025 22:13
Copy link
Owner

@osandov osandov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay finally got back to this. In addition to the inline comments, a few high-level comments:

  1. I don't think there will be enough runqueue-specific stuff to merit its own module, so let's move the new helpers to drgn.helpers.linux.sched.
  2. The --drgn option is the main motivation for porting these, so please add that.
  3. This needs some basic test cases. See tests/linux_kernel/crash_commands for examples.

Comment on lines +57 to +62
argument("-t", action="store_true", dest="show_timestamps"),
argument("-T", action="store_true", dest="show_lag"),
argument("-m", action="store_true", dest="pretty_runtime"),
argument("-g", action="store_true", dest="group"),
),
argument("-c", type=str, default="a", dest="cpus"),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These all need help strings.

]
table.append(row)
else:
print(f"CPU {cpu} RUNQUEUE: {hex(runqueue.address_of_().value_())}")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crash doesn't include the 0x prefix. Also, a shorter way to do .address_of_().value_() is .address_.

Suggested change
print(f"CPU {cpu} RUNQUEUE: {hex(runqueue.address_of_().value_())}")
print(f"CPU {cpu} RUNQUEUE: {runqueue.address_:x}")

The same hex() -> ":x" format thing applies in a few more places, too.

runq_clocks: Dict[int, int] = {}
cpus = parse_cpuspec(args.cpus).cpus(prog)
for i, (cpu, runqueue) in enumerate(get_rq_per_cpu(prog, cpus)):
curr_task = runqueue.curr[0].address_of_()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the [0].address_of_() for? Instead, I think we want to read it:

Suggested change
curr_task = runqueue.curr[0].address_of_()
curr_task = runqueue.curr.read_()

cpus = parse_cpuspec(args.cpus).cpus(prog)
for i, (cpu, runqueue) in enumerate(get_rq_per_cpu(prog, cpus)):
curr_task = runqueue.curr[0].address_of_()
curr_task_addr = runqueue.curr.value_()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid re-reading runqueue.curr:

Suggested change
curr_task_addr = runqueue.curr.value_()
curr_task_addr = curr_task.value_()

(You could also remove this and use curr_task.value_() everywhere, but either way is fine.)

else:
print(f"CPU {cpu} RUNQUEUE: {hex(runqueue.address_of_().value_())}")
print(
f' CURRENT: PID: {pid:<6d} TASK: {hex(curr_task_addr)} PRIO: {prio} COMMAND: "{comm}"'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra spaces after "CURRENT:" make it look like there's supposed to be a value there but it's empty. Crash also doesn't include them.

Suggested change
f' CURRENT: PID: {pid:<6d} TASK: {hex(curr_task_addr)} PRIO: {prio} COMMAND: "{comm}"'
f' CURRENT: PID: {pid:<6d} TASK: {hex(curr_task_addr)} PRIO: {prio} COMMAND: "{comm}"'

Comment on lines +126 to +140
if args.group:
root_task_group_addr = prog["root_task_group"].address_of_().value_()
if rt_tasks:
print(
f" ROOT_TASK_GROUP: {hex(root_task_group_addr)} RT_RQ: {hex(runqueue.rt.address_of_().value_())}"
)
if cfs_tasks:
print(
f" ROOT_TASK_GROUP: {hex(root_task_group_addr)} CFS_RQ: {hex(runqueue.cfs.address_of_().value_())}"
)
if cfs_tasks or rt_tasks:
print(
" " * 4,
f'[{prio:3d}] PID: {pid:<6d} TASK: {hex(curr_task_addr)} COMMAND: "{comm}" [CURRENT]',
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the crash implementation actually walks up the task group hierarchy, which this doesn't appear to do. Maybe let's just drop -g for now so we can land the more basic stuff, first.


# Show lag (skip formatting if not last CPU)
if args.show_lag:
runq_clocks[cpu] = task_rq(curr_task).clock.value_()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of roundabout to go from runqueue -> current task -> runqueue. This also does unnecessary work at the beginning of the loop over CPUs. I think it'd make more sense to make the show_lag case a special case at the beginning of the function rather than doing it in this main loop. Something like:

    cpus = parse_cpuspec(args.cpus).cpus(prog)

    if args.show_lag:
        runq_clocks = {cpu: cpu_rq(cpu).clock.value_() for cpu in cpus}
        max_clock = max(runq_clocks.values())
        lags = [(cpu, max_clock - runq_clock) for cpu, runq_clock in runq_clocks.items()]
        lags.sort(key=operator.itemgetter(1))
        for cpu, lag in lags:
            print(f"  CPU {cpu}: {lag / 1e9:.2f} secs")
        return

    for i, (cpu, runqueue) in enumerate(get_rq_per_cpu(prog, cpus)):
        ...

continue

if table_format:
row = [
Copy link
Owner

@osandov osandov Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format of these options is fairly different from the crash output. Was that an intentional choice? That's okay if there's an intentional reason, but it'd be nice to imitate crash's format if not.

Get RT runqueue tasks in rt scheduler.
:param runqueue: Object
:return: Iterator of ``struct task_struct``
Copy link
Owner

@osandov osandov Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be updated to "Iterator of struct task_struct * objects."

Get CFS runqueue tasks in cfs scheduler.
:param runqueue: ``struct rq *``
:return: Iterator of (``struct task_struct``, int) tuples
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems out of date.

Suggested change
:return: Iterator of (``struct task_struct``, int) tuples
:return: Iterator of ``struct task_struct *`` objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants