drgn: add runq command #533

richl9 · 2025-08-06T19:04:39Z

Pushed a new branch for the runqueue helper with the earlier feedback applied. Just wanted to get a quick review to see if any structural changes are needed before I continue:) One thing is that maybe we want to move parse_range_list to somewhere else?

osandov

Thanks, I tried to clear up what @brenns10 and I had in mind and had a few more comments here and there. Let me know if that still doesn't make sense.

Also, I think you meant to say "command" instead of "helper" in your commit and PR titles.

osandov · 2025-08-06T19:34:00Z

drgn/helpers/common/format.py

+def parse_cpus_arg(cpus_arg: str, max_cpu: int) -> Set[int]:
+    """
+    Parse argument to -c for cpu restriction (e.g. '1,3,5-8').
+
+    :param cpus_arg: str
+    :param  max_cpu: int
+    :returns: a set of specified cpus
+    """
+    cpus = parse_range_list(cpus_arg)
+    cpus = {c for c in cpus if c < max_cpu}
+
+    return cpus


Since your last PR, I added drgn.commands.crash.parse_cpuspec(). You can use that instead of adding this function. See drgn/commands/_builtin/crash/_eval.py for an example.

osandov · 2025-08-06T19:38:52Z

drgn/helpers/common/format.py

+def timestamp_str(ns: int) -> str:
+    """Convert nanoseconds to 'days HH:MM:SS.mmm' string."""
+    ms_total = ns // 1000000
+    secs_total, ms = divmod(ms_total, 1000)
+    mins_total, secs = divmod(secs_total, 60)
+    hours_total, mins = divmod(mins_total, 60)
+    days, hours = divmod(hours_total, 24)
+
+    return f"{days} {hours:02}:{mins:02}:{secs:02}.{ms:03}"


This format seems specific to the runq command from what I can tell, so it can just go in drgn/commands/_builtin/crash/runq.py.

osandov · 2025-08-06T19:43:57Z

drgn/helpers/linux/runqueue.py

+    """
+    # Try common fields in order; not all will exist on all kernels
+    rq_ts = 0
+    for name in ("clock", "most_recent_timestamp", "timestamp_last_tick"):


From what I can tell, rq->clock was added all the way back in torvalds/linux@6aa645e in Linux 2.6.23. What are the other timestamps needed for?

Just cleaned them.

osandov · 2025-08-06T19:51:21Z

drgn/helpers/linux/runqueue.py

+    return rq_ts
+
+
+def get_rt_runq(runqueue: Object) -> Dict[str, Any]:


drgn helpers tend to be generators rather than returning large data structures. E.g., a helper that yields a struct task_struct * object for each realtime task in a runqueue (or whatever variation of that is needed for the command) would be more appropriate.

osandov · 2025-08-06T19:51:54Z

drgn/helpers/linux/runqueue.py

+    return {"prio_array": prio_array, "tasks": results}
+
+
+def get_cfs_runq(runqueue: Object, task_group: bool = False) -> Dict[str, Any]:


Same here, some sort of generator would be preferred.

osandov · 2025-08-06T20:01:19Z

drgn/helpers/linux/runqueue.py

+    return {"cfs_root": cfs_root, "tasks": results}
+
+
+def run_queue(prog: Program, args: argparse.Namespace) -> List[Dict[str, Any]]:


This doesn't make much sense as a helper. It's basically the implementation of _crash_cmd_runq just modified to spit out a dictionary.

It's expected that a complex command like runq will have complex logic for iterating over whatever is needed and printing it, so that part can go back inside of _crash_cmd_runq. The point of @brenns10's comment on the previous PR was that any non-trivial inspection of the internals of a subsystem should go in small, self-contained helper functions. For example, it's a great idea to add helpers for iterating over tasks in a runqueue, and it's totally fine to call those directly from the command function.

That makes sense. I kept get_rq_per_cpu helper though just to get per cpu runqueue since I think it may be qualified as a "kernel internal inspection". Let me know what do you think.

osandov · 2025-08-06T20:01:36Z

tests/linux_kernel/__init__.py



-def parse_range_list(s):
+def parse_range_list(s) -> Set[int]:


You can drop this since it won't be needed anymore once you use parse_cpuspec() instead.

richl9 · 2025-08-07T18:53:54Z

HI @osandov , thanks for the comments. Anymore big change suggestions before I wrap up the details?

osandov

Thanks, this structure is much better. I commented a couple more tweaks for the structure and some more comments on the helpers themselves.

osandov · 2025-08-14T06:36:10Z

drgn/commands/_builtin/crash/runq.py

+    return f"{days} {hours:02}:{mins:02}:{secs:02}.{ms:03}"
+
+
+def dump_run_queue(prog: Program, args: argparse.Namespace) -> None:


No need for this to be a separate function, just put it straight in _crash_cmd_runq.

osandov · 2025-08-14T06:44:39Z

drgn/helpers/linux/runqueue.py

+from drgn.helpers.linux.percpu import per_cpu
+
+
+def get_rt_runq_tasks(runqueue: Object) -> Iterator[Object]:


Bikeshedding naming, the kernel doesn't seem to use the term runq very much. It's either rq or runqueue. Iterator helpers also tend to be named with for_each, although it's not a strict rule.

How about rq_for_each_rt_task(), runqueue_rt_tasks(), or some variation of those?

osandov · 2025-08-14T06:45:29Z

drgn/helpers/linux/runqueue.py

+    """
+    Get RT runqueue tasks in rt scheduler.
+
+    :param runqueue: Object


This should say what type is expected:

Suggested change

:param runqueue: Object

:param runqueue: ``struct rq *``

osandov · 2025-08-14T06:46:09Z

drgn/helpers/linux/runqueue.py

+
+def get_rt_runq_tasks(runqueue: Object) -> Iterator[Object]:
+    """
+    Get RT runqueue tasks in rt scheduler.


I'd spell out "real-time" in the docstring here for clarity.

osandov · 2025-08-14T06:46:38Z

drgn/helpers/linux/runqueue.py

+    Get RT runqueue tasks in rt scheduler.
+
+    :param runqueue: Object
+    :return: Iterator of ``struct task_struct``


This actually yields pointers, right?

Suggested change

:return: Iterator of ``struct task_struct``

:return: Iterator of ``struct task_struct *``

This still needs to be updated to "Iterator of struct task_struct * objects."

osandov · 2025-08-14T06:59:01Z

drgn/helpers/linux/runqueue.py

+    :param runqueue: Object
+    :return: Iterator of (``struct task_struct``, int) tuples


Same comment about correcting the types used here.

osandov · 2025-08-14T06:59:22Z

drgn/helpers/linux/runqueue.py

+    :param runqueue: Object
+    :return: Iterator of (``struct task_struct``, int) tuples
+    """
+    runq = runqueue.address_of_()


This shouldn't be necessary, just use runqueue directly.

osandov · 2025-08-14T07:01:49Z

drgn/helpers/linux/runqueue.py

+        if t == runqueue.curr:
+            continue


Why is this check necessary? It seems more appropriate for the caller to ignore the current task if that's what it wants. If you get rid of the check, then this whole function body can be replaced by:

return list_for_each_entry( "struct task_struct", runqueue.cfs_tasks.address_of_(), "se.group_node" )

osandov · 2025-08-14T07:03:23Z

drgn/helpers/linux/runqueue.py

+        yield container_of(t, "struct task_struct", "rt")
+
+
+def get_rq_per_cpu(prog: Program, cpus: List[int] = []) -> Iterator[Tuple[int, Object]]:


I still think this one should go in drgn/commands/_builtin/crash/runq.py instead of as a public helper.

osandov · 2025-08-14T07:05:11Z

drgn/helpers/linux/runqueue.py

+    ):
+        if t == runqueue.curr:
+            continue
+        yield container_of(t, "struct task_struct", "rt")


This container_of() is for real-time, which doesn't seem right for this function.

Signed-off-by: Richard Li <[email protected]>

osandov

Okay finally got back to this. In addition to the inline comments, a few high-level comments:

I don't think there will be enough runqueue-specific stuff to merit its own module, so let's move the new helpers to drgn.helpers.linux.sched.
The --drgn option is the main motivation for porting these, so please add that.
This needs some basic test cases. See tests/linux_kernel/crash_commands for examples.

osandov · 2025-11-14T19:49:59Z

drgn/commands/_builtin/crash/_runq.py

+            argument("-t", action="store_true", dest="show_timestamps"),
+            argument("-T", action="store_true", dest="show_lag"),
+            argument("-m", action="store_true", dest="pretty_runtime"),
+            argument("-g", action="store_true", dest="group"),
+        ),
+        argument("-c", type=str, default="a", dest="cpus"),


These all need help strings.

osandov · 2025-11-14T19:50:58Z

drgn/commands/_builtin/crash/_runq.py

+                ]
+            table.append(row)
+        else:
+            print(f"CPU {cpu} RUNQUEUE: {hex(runqueue.address_of_().value_())}")


Crash doesn't include the 0x prefix. Also, a shorter way to do .address_of_().value_() is .address_.

Suggested change

print(f"CPU {cpu} RUNQUEUE: {hex(runqueue.address_of_().value_())}")

print(f"CPU {cpu} RUNQUEUE: {runqueue.address_:x}")

The same hex() -> ":x" format thing applies in a few more places, too.

osandov · 2025-11-14T19:52:55Z

drgn/commands/_builtin/crash/_runq.py

+    runq_clocks: Dict[int, int] = {}
+    cpus = parse_cpuspec(args.cpus).cpus(prog)
+    for i, (cpu, runqueue) in enumerate(get_rq_per_cpu(prog, cpus)):
+        curr_task = runqueue.curr[0].address_of_()


What is the [0].address_of_() for? Instead, I think we want to read it:

Suggested change

curr_task = runqueue.curr[0].address_of_()

curr_task = runqueue.curr.read_()

osandov · 2025-11-14T19:57:05Z

drgn/commands/_builtin/crash/_runq.py

+    cpus = parse_cpuspec(args.cpus).cpus(prog)
+    for i, (cpu, runqueue) in enumerate(get_rq_per_cpu(prog, cpus)):
+        curr_task = runqueue.curr[0].address_of_()
+        curr_task_addr = runqueue.curr.value_()


Avoid re-reading runqueue.curr:

Suggested change

curr_task_addr = runqueue.curr.value_()

curr_task_addr = curr_task.value_()

(You could also remove this and use curr_task.value_() everywhere, but either way is fine.)

osandov · 2025-11-14T20:48:01Z

drgn/commands/_builtin/crash/_runq.py

+        else:
+            print(f"CPU {cpu} RUNQUEUE: {hex(runqueue.address_of_().value_())}")
+            print(
+                f'  CURRENT:   PID: {pid:<6d}  TASK: {hex(curr_task_addr)}  PRIO: {prio}  COMMAND: "{comm}"'


The extra spaces after "CURRENT:" make it look like there's supposed to be a value there but it's empty. Crash also doesn't include them.

Suggested change

f' CURRENT: PID: {pid:<6d} TASK: {hex(curr_task_addr)} PRIO: {prio} COMMAND: "{comm}"'

f' CURRENT: PID: {pid:<6d} TASK: {hex(curr_task_addr)} PRIO: {prio} COMMAND: "{comm}"'

osandov · 2025-11-14T20:59:27Z

drgn/commands/_builtin/crash/_runq.py

+            if args.group:
+                root_task_group_addr = prog["root_task_group"].address_of_().value_()
+                if rt_tasks:
+                    print(
+                        f"  ROOT_TASK_GROUP: {hex(root_task_group_addr)}  RT_RQ: {hex(runqueue.rt.address_of_().value_())}"
+                    )
+                if cfs_tasks:
+                    print(
+                        f"  ROOT_TASK_GROUP: {hex(root_task_group_addr)}  CFS_RQ: {hex(runqueue.cfs.address_of_().value_())}"
+                    )
+                if cfs_tasks or rt_tasks:
+                    print(
+                        " " * 4,
+                        f'[{prio:3d}] PID: {pid:<6d} TASK: {hex(curr_task_addr)}  COMMAND: "{comm}" [CURRENT]',
+                    )


It looks like the crash implementation actually walks up the task group hierarchy, which this doesn't appear to do. Maybe let's just drop -g for now so we can land the more basic stuff, first.

osandov · 2025-11-14T21:09:18Z

drgn/commands/_builtin/crash/_runq.py

+
+        # Show lag (skip formatting if not last CPU)
+        if args.show_lag:
+            runq_clocks[cpu] = task_rq(curr_task).clock.value_()


It's kind of roundabout to go from runqueue -> current task -> runqueue. This also does unnecessary work at the beginning of the loop over CPUs. I think it'd make more sense to make the show_lag case a special case at the beginning of the function rather than doing it in this main loop. Something like:

cpus = parse_cpuspec(args.cpus).cpus(prog) if args.show_lag: runq_clocks = {cpu: cpu_rq(cpu).clock.value_() for cpu in cpus} max_clock = max(runq_clocks.values()) lags = [(cpu, max_clock - runq_clock) for cpu, runq_clock in runq_clocks.items()] lags.sort(key=operator.itemgetter(1)) for cpu, lag in lags: print(f" CPU {cpu}: {lag / 1e9:.2f} secs") return for i, (cpu, runqueue) in enumerate(get_rq_per_cpu(prog, cpus)): ...

osandov · 2025-11-14T21:11:05Z

drgn/commands/_builtin/crash/_runq.py

+                continue
+
+        if table_format:
+            row = [


The format of these options is fairly different from the crash output. Was that an intentional choice? That's okay if there's an intentional reason, but it'd be nice to imitate crash's format if not.

osandov · 2025-11-14T21:11:55Z

drgn/helpers/linux/runqueue.py

+    Get RT runqueue tasks in rt scheduler.
+
+    :param runqueue: Object
+    :return: Iterator of ``struct task_struct``


This still needs to be updated to "Iterator of struct task_struct * objects."

osandov · 2025-11-14T21:12:23Z

drgn/helpers/linux/runqueue.py

+    Get CFS runqueue tasks in cfs scheduler.
+
+    :param runqueue: ``struct rq *``
+    :return: Iterator of (``struct task_struct``, int) tuples


This seems out of date.

Suggested change

:return: Iterator of (``struct task_struct``, int) tuples

:return: Iterator of ``struct task_struct *`` objects.

osandov reviewed Aug 6, 2025

View reviewed changes

richl9 force-pushed the richard/crash/runq branch 3 times, most recently from 350eafa to 1e42d58 Compare August 7, 2025 18:50

richl9 changed the title ~~drgn: add runq helper~~ drgn: add runq command Aug 7, 2025

osandov reviewed Aug 14, 2025

View reviewed changes

richl9 force-pushed the richard/crash/runq branch from 1e42d58 to 6e6338c Compare October 22, 2025 22:09

drgn: add runq command

b04e9c6

Signed-off-by: Richard Li <[email protected]>

richl9 force-pushed the richard/crash/runq branch from 6e6338c to b04e9c6 Compare October 22, 2025 22:13

osandov requested changes Nov 14, 2025

View reviewed changes

		return rq_ts


		def get_rt_runq(runqueue: Object) -> Dict[str, Any]:

		return {"prio_array": prio_array, "tasks": results}


		def get_cfs_runq(runqueue: Object, task_group: bool = False) -> Dict[str, Any]:

		return {"cfs_root": cfs_root, "tasks": results}


		def run_queue(prog: Program, args: argparse.Namespace) -> List[Dict[str, Any]]:



		def parse_range_list(s):
		def parse_range_list(s) -> Set[int]:

		return f"{days} {hours:02}:{mins:02}:{secs:02}.{ms:03}"


		def dump_run_queue(prog: Program, args: argparse.Namespace) -> None:

		from drgn.helpers.linux.percpu import per_cpu


		def get_rt_runq_tasks(runqueue: Object) -> Iterator[Object]:

	:return: Iterator of ``struct task_struct``
	:return: Iterator of ``struct task_struct *``

		:param runqueue: Object
		:return: Iterator of (``struct task_struct``, int) tuples

		yield container_of(t, "struct task_struct", "rt")


		def get_rq_per_cpu(prog: Program, cpus: List[int] = []) -> Iterator[Tuple[int, Object]]:

	print(f"CPU {cpu} RUNQUEUE: {hex(runqueue.address_of_().value_())}")
	print(f"CPU {cpu} RUNQUEUE: {runqueue.address_:x}")

	curr_task = runqueue.curr[0].address_of_()
	curr_task = runqueue.curr.read_()

	curr_task_addr = runqueue.curr.value_()
	curr_task_addr = curr_task.value_()

	f' CURRENT: PID: {pid:<6d} TASK: {hex(curr_task_addr)} PRIO: {prio} COMMAND: "{comm}"'
	f' CURRENT: PID: {pid:<6d} TASK: {hex(curr_task_addr)} PRIO: {prio} COMMAND: "{comm}"'

drgn: add runq command #533

Are you sure you want to change the base?

drgn: add runq command #533

Conversation

richl9 commented Aug 6, 2025

Uh oh!

osandov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richl9 commented Aug 7, 2025

Uh oh!

osandov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

osandov Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

osandov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

osandov Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

osandov Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

osandov Nov 14, 2025 •

edited

Loading

osandov Nov 14, 2025 •

edited

Loading

osandov Nov 14, 2025 •

edited

Loading