Skip to content

api: replace clock_t with high-res type#623

Merged
rjarry merged 1 commit into
DPDK:mainfrom
rjarry:clock_ns
May 25, 2026
Merged

api: replace clock_t with high-res type#623
rjarry merged 1 commit into
DPDK:mainfrom
rjarry:clock_ns

Conversation

@rjarry

@rjarry rjarry commented May 25, 2026

Copy link
Copy Markdown
Collaborator

The clock_t type is implementation specific. E.g. on Linux, it is microseconds, and on Windows it is milliseconds.

Replace all uses of clock_t and its accompanying function, gr_clock_us(), with a new nanosecond resolution clock type, gr_clock_ns_t, and its accompanying function, gr_clock_ns().

Note about resolution and size of the new clock type:

A clock with nanosecond resolution can be used for purposes not possible with microsecond resolution, e.g. shaping and pacing. For reference, the duration of a minimum size (84 byte) packet on a 100 Gbit/s Ethernet link is 6.72 ns.

The size of the new type is 64 bit, the same size as the type it replaces (on supported implementations). Changing the resolution to nanoseconds thus does not impact the size of data structures where it is used.

Note about signedness of the new clock type:

I considered making the gr_clock_ns_t unsigned, but that would require additional considerations in code where it is used, e.g. for calculating age, to prevent wraparound in race conditions. E.g.:

age = (gr_clock_ns() - fdb->last_seen) / GR_NS_PER_S;

If the current thread reads the clock using gr_clock_ns(), and another thread races to set fdb->last_seen afterwards, the result of the subtraction is negative. The division by GR_NS_PER_S makes the age zero. If gr_clock_ns_t was unsigned, the negative result of the subtraction would be a very large unsigned number. Dividing this very large number by NS_PER_S would be a large unsigned number, not zero.

Obviously, this could be fixed by type casting gr_clock_ns_t values to signed int64_t everywhere they are used with subtraction. But such a requirement increases the risk of bugs. So I decided to make it signed, like time_t.

API clock type migration to nanosecond resolution

Core API changes

  • Added typedef: typedef int64_t gr_clock_ns_t in api/gr_clock.h
  • Added conversion constant: #define GR_NS_PER_S (gr_clock_ns_t)1000000000LL
  • Added helper: static inline gr_clock_ns_t gr_clock_ns(void) — returns monotonic CLOCK_MONOTONIC_RAW elapsed nanoseconds
  • Removed helper: gr_clock_us() (microsecond helper removed)

Type and field migrations (clock_t → gr_clock_ns_t)

  • infra/control/nexthop.h: last_reply, last_request
  • infra/control/bond.h: next_tx, last_rx
  • l2/api/gr_l2.h: gr_fdb_entry.last_seen
  • l2/control/fdb.c & l2/cli/fdb.c: last_seen storage and AGE computation updated to nanoseconds/GR_NS_PER_S
  • policy/api/gr_conntrack.h: gr_conntrack.last_update
  • policy/control/conntrack.h: struct conn last_update (_Atomic)
  • policy/control/conntrack.c & policy/cli/conntrack.c: last_update store and age computations use gr_clock_ns() and GR_NS_PER_S
  • ip/api/gr_ip4.h: gr_ip4_icmp_recv_resp.response_time
  • ip/api/gr_ip6.h: gr_ip6_icmp_recv_resp.response_time
  • ip/control/icmp.c, ip/datapath/icmp_local_send.c, ip/datapath/icmp_input.c: ICMP timestamp storage, payload size, and callback timestamps use gr_clock_ns_t/gr_clock_ns()
  • ip6/control/icmp6.c, ip6/datapath/icmp6_local_send.c, ip6/datapath/icmp6_input.c, ip6/datapath/ndp_ns_output.c: ICMPv6/NDP timestamp storage, payload sizing, and callback timestamps use gr_clock_ns_t/gr_clock_ns()
  • ip/control/nexthop.c & ip6/control/nexthop.c: l3->last_reply updated with gr_clock_ns()
  • ip/datapath/arp_output_request.c & ip6/datapath/ndp_ns_output.c: last_request updated with gr_clock_ns()
  • infra/control/lacp.c: member->last_rx, member->next_tx and local now/timeout variables switched to gr_clock_ns_t; timeout calculations multiply LACP_*_TIMEOUT by GR_NS_PER_S
  • cli/main.c: srandom(...) seed switched to gr_clock_ns()
  • smoke/fib_inject.c: timing measurement switched to gr_clock_ns()

Timing computation pattern

  • Timestamps are recorded with gr_clock_ns().
  • Age/timeout calculations convert nanoseconds to seconds by dividing by GR_NS_PER_S.
  • Scheduling computes future times with now + (timeout_seconds * GR_NS_PER_S).

Rationale documented in commit metadata (kept literal from code/commit)

  • Higher resolution (nanoseconds) targets precise timing needs (e.g., packet transmission durations at 100 Gbit/s).
  • gr_clock_ns_t is signed int64_t to avoid unsigned-wraparound issues during subtraction and to preserve existing 64-bit width so data-structure sizes remain unchanged.

Other notes

  • Removed local <time.h> includes where no longer needed after switching to gr_clock.h.
  • No exported function signatures were otherwise changed beyond struct field type updates; most changes are internal storage, payload sizing, and arithmetic converting to/from GR_NS_PER_S.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR replaces microsecond-based timing with a nanosecond clock API (gr_clock_ns_t, GR_NS_PER_S, gr_clock_ns()) and updates all uses and stored timestamps to that type. Struct fields (bond/LACP, nexthop, ICMP v4/v6 responses, L2 FDB, conntrack) were changed to gr_clock_ns_t. All timing arithmetic—aging, timeouts, ICMP payload timestamps, periodic scheduling, and CLI/display calculations—now use gr_clock_ns() and convert to seconds via GR_NS_PER_S.

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modules/policy/cli/conntrack.c`:
- Line 58: Clamp the computed elapsed time before formatting: compute a
signed/unsigned elapsed variable as (now - conn->last_update), check if elapsed
is negative and set it to 0 if so, then pass (elapsed / GR_NS_PER_S) to
gr_table_cell instead of the raw (now - conn->last_update); update the call site
referenced by gr_table_cell and the conn->last_update/now calculation to use
this clamped elapsed value so LAST_UPDATE never displays negative seconds.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e4a55c0c-141f-4bbb-bf8f-66fb4ee6e298

📥 Commits

Reviewing files that changed from the base of the PR and between ee3b837 and 11d55aa.

📒 Files selected for processing (26)
  • api/gr_clock.h
  • cli/main.c
  • modules/infra/control/bond.h
  • modules/infra/control/l3_nexthop.c
  • modules/infra/control/lacp.c
  • modules/infra/control/nexthop.h
  • modules/ip/api/gr_ip4.h
  • modules/ip/control/icmp.c
  • modules/ip/control/nexthop.c
  • modules/ip/datapath/arp_output_request.c
  • modules/ip/datapath/icmp_input.c
  • modules/ip/datapath/icmp_local_send.c
  • modules/ip6/api/gr_ip6.h
  • modules/ip6/control/icmp6.c
  • modules/ip6/control/nexthop.c
  • modules/ip6/datapath/icmp6_input.c
  • modules/ip6/datapath/icmp6_local_send.c
  • modules/ip6/datapath/ndp_ns_output.c
  • modules/l2/api/gr_l2.h
  • modules/l2/cli/fdb.c
  • modules/l2/control/fdb.c
  • modules/policy/api/gr_conntrack.h
  • modules/policy/cli/conntrack.c
  • modules/policy/control/conntrack.c
  • modules/policy/control/conntrack.h
  • smoke/fib_inject.c

gr_table_cell(table, 7, "%u", ntohs(conn->fwd_flow.src_id));
gr_table_cell(table, 8, "%u", ntohs(conn->fwd_flow.dst_id));
gr_table_cell(table, 9, "%lu", (now - conn->last_update) / 1000000);
gr_table_cell(table, 9, "%ld", (now - conn->last_update) / GR_NS_PER_S);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clamp negative elapsed time in LAST_UPDATE display.

Line 58 can print negative age when conn->last_update is newer than the pre-captured now (concurrent updates). Clamp to zero before formatting.

Proposed fix
-		gr_table_cell(table, 9, "%ld", (now - conn->last_update) / GR_NS_PER_S);
+		gr_clock_ns_t elapsed = now - conn->last_update;
+		if (elapsed < 0)
+			elapsed = 0;
+		gr_table_cell(table, 9, "%ld", elapsed / GR_NS_PER_S);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
gr_table_cell(table, 9, "%ld", (now - conn->last_update) / GR_NS_PER_S);
gr_clock_ns_t elapsed = now - conn->last_update;
if (elapsed < 0)
elapsed = 0;
gr_table_cell(table, 9, "%ld", elapsed / GR_NS_PER_S);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/policy/cli/conntrack.c` at line 58, Clamp the computed elapsed time
before formatting: compute a signed/unsigned elapsed variable as (now -
conn->last_update), check if elapsed is negative and set it to 0 if so, then
pass (elapsed / GR_NS_PER_S) to gr_table_cell instead of the raw (now -
conn->last_update); update the call site referenced by gr_table_cell and the
conn->last_update/now calculation to use this clamped elapsed value so
LAST_UPDATE never displays negative seconds.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
modules/ip/control/icmp.c (1)

176-177: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Do not free queued mbufs in icmp_fini; this can double-free shared pool-owned packets.

The fini drain should release queue items only, not mbufs, per this module’s ownership contract.

Proposed fix
 static void icmp_fini(struct event_base *) {
 	if (pool != NULL) {
 		struct icmp_queue_item *i, *tmp;
 		STAILQ_FOREACH_SAFE (i, &icmp_queue, next, tmp)
-			icmp_queue_pop(i, true);
+			icmp_queue_pop(i, false);
 		rte_mempool_free(pool);
 		pool = NULL;
 	}
 }

Based on learnings: "In modules that manage ICMP packet queues (e.g., icmp.c and icmp6.c), do not manually free mbufs from icmp_queue in fini functions if those mbufs are owned by other pktmbuf pools that are freed elsewhere."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/ip/control/icmp.c` around lines 176 - 177, The fini path is freeing
mbufs from icmp_queue which can double-free pool-owned packets; in icmp_fini
replace the draining loop that calls STAILQ_FOREACH_SAFE and icmp_queue_pop(i,
true) with a call that only releases queue entries without freeing the
underlying mbuf (e.g., call icmp_queue_pop(i, false) or use an existing/pop-only
helper), so the queue items are removed but their mbufs are not freed here;
ensure the change touches the icmp_fini loop and any call sites referencing
icmp_queue_pop to preserve the module's ownership contract.
modules/ip6/control/icmp6.c (1)

164-166: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

icmp_fini should not free queued mbufs; this risks double free under shared mbuf ownership.

Drain the queue items without freeing mbufs in fini.

Proposed fix
 static void icmp_fini(struct event_base *) {
 	if (pool != NULL) {
 		struct icmp_queue_item *i, *tmp;
 		STAILQ_FOREACH_SAFE (i, &icmp_queue, next, tmp)
-			icmp6_queue_pop(i, true);
+			icmp6_queue_pop(i, false);
 		rte_mempool_free(pool);
 		pool = NULL;
 	}
 }

Based on learnings: "In modules that manage ICMP packet queues (e.g., icmp.c and icmp6.c), do not manually free mbufs from icmp_queue in fini functions if those mbufs are owned by other pktmbuf pools that are freed elsewhere."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/ip6/control/icmp6.c` around lines 164 - 166, In icmp_fini, stop
freeing mbufs owned elsewhere when draining icmp_queue: instead of calling
icmp6_queue_pop(i, true) (which frees mbufs), remove/unwrap queue entries
without freeing their mbufs (e.g., call icmp6_queue_pop(i, false) or use a
removal routine that only unlinks entries) so the mbufs remain owned by their
pktmbuf pools; leave the rte_mempool_free(pool) call as-is for the pool cleanup.
Ensure you only unlink entries from icmp_queue and do not call any mbuf free
paths in icmp6_queue_pop or related helpers during fini.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modules/infra/control/lacp.c`:
- Around line 176-178: The ternary is evaluated before the addition, so now is
not added to the chosen timeout; change the assignment to add now to the entire
conditional result by grouping the ternary expression—i.e., compute
member->next_tx by adding now to ((member->remote.state & LACP_STATE_FAST) ?
LACP_SHORT_TIMEOUT * GR_NS_PER_S : LACP_LONG_TIMEOUT * GR_NS_PER_S) so that
member->next_tx, member->remote.state, LACP_STATE_FAST, LACP_SHORT_TIMEOUT,
LACP_LONG_TIMEOUT, GR_NS_PER_S and now are used correctly.

---

Outside diff comments:
In `@modules/ip/control/icmp.c`:
- Around line 176-177: The fini path is freeing mbufs from icmp_queue which can
double-free pool-owned packets; in icmp_fini replace the draining loop that
calls STAILQ_FOREACH_SAFE and icmp_queue_pop(i, true) with a call that only
releases queue entries without freeing the underlying mbuf (e.g., call
icmp_queue_pop(i, false) or use an existing/pop-only helper), so the queue items
are removed but their mbufs are not freed here; ensure the change touches the
icmp_fini loop and any call sites referencing icmp_queue_pop to preserve the
module's ownership contract.

In `@modules/ip6/control/icmp6.c`:
- Around line 164-166: In icmp_fini, stop freeing mbufs owned elsewhere when
draining icmp_queue: instead of calling icmp6_queue_pop(i, true) (which frees
mbufs), remove/unwrap queue entries without freeing their mbufs (e.g., call
icmp6_queue_pop(i, false) or use a removal routine that only unlinks entries) so
the mbufs remain owned by their pktmbuf pools; leave the rte_mempool_free(pool)
call as-is for the pool cleanup. Ensure you only unlink entries from icmp_queue
and do not call any mbuf free paths in icmp6_queue_pop or related helpers during
fini.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 175e0d13-a66f-4f1b-94cb-cd34122c209a

📥 Commits

Reviewing files that changed from the base of the PR and between 11d55aa and ee8daf4.

📒 Files selected for processing (26)
  • api/gr_clock.h
  • cli/main.c
  • modules/infra/control/bond.h
  • modules/infra/control/l3_nexthop.c
  • modules/infra/control/lacp.c
  • modules/infra/control/nexthop.h
  • modules/ip/api/gr_ip4.h
  • modules/ip/control/icmp.c
  • modules/ip/control/nexthop.c
  • modules/ip/datapath/arp_output_request.c
  • modules/ip/datapath/icmp_input.c
  • modules/ip/datapath/icmp_local_send.c
  • modules/ip6/api/gr_ip6.h
  • modules/ip6/control/icmp6.c
  • modules/ip6/control/nexthop.c
  • modules/ip6/datapath/icmp6_input.c
  • modules/ip6/datapath/icmp6_local_send.c
  • modules/ip6/datapath/ndp_ns_output.c
  • modules/l2/api/gr_l2.h
  • modules/l2/cli/fdb.c
  • modules/l2/control/fdb.c
  • modules/policy/api/gr_conntrack.h
  • modules/policy/cli/conntrack.c
  • modules/policy/control/conntrack.c
  • modules/policy/control/conntrack.h
  • smoke/fib_inject.c
✅ Files skipped from review due to trivial changes (1)
  • cli/main.c

Comment thread modules/infra/control/lacp.c Outdated
The clock_t type is implementation specific. E.g. on Linux, it is
microseconds, and on Windows it is milliseconds.

Replace all uses of clock_t and its accompanying function, gr_clock_us(),
with a new nanosecond resolution clock type, gr_clock_ns_t, and its
accompanying function, gr_clock_ns().

Note about resolution and size of the new clock type:

A clock with nanosecond resolution can be used for purposes not possible
with microsecond resolution, e.g. shaping and pacing. For reference, the
duration of a minimum size (84 byte) packet on a 100 Gbit/s Ethernet
link is 6.72 ns.

The size of the new type is 64 bit, the same size as the type it
replaces (on supported implementations). Changing the resolution to
nanoseconds thus does not impact the size of data structures where it is
used.

Note about signedness of the new clock type:

I considered making the gr_clock_ns_t unsigned, but that would require
additional considerations in code where it is used, e.g. for calculating
age, to prevent wraparound in race conditions. E.g.:

	age = (gr_clock_ns() - fdb->last_seen) / GR_NS_PER_S;

If the current thread reads the clock using gr_clock_ns(), and another
thread races to set fdb->last_seen afterwards, the result of the
subtraction is negative. The division by GR_NS_PER_S makes the age zero.
If gr_clock_ns_t was unsigned, the negative result of the subtraction
would be a very large unsigned number. Dividing this very large number
by GR_NS_PER_S would be a large unsigned number, not zero.

Obviously, this could be fixed by type casting gr_clock_ns_t values to
signed int64_t everywhere they are used with subtraction. But such
a requirement increases the risk of bugs. So I decided to make it
signed, like time_t.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Robin Jarry <rjarry@redhat.com>
@rjarry rjarry merged commit fe0c7f9 into DPDK:main May 25, 2026
12 of 14 checks passed
@rjarry rjarry deleted the clock_ns branch May 25, 2026 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants