-
Notifications
You must be signed in to change notification settings - Fork 32
mctpd: replace sd-event timers #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mctpd: replace sd-event timers #123
Conversation
|
This PR is still missing some crucial pieces:
|
| /* | ||
| * Error handling policy: | ||
| * | ||
| * 1. Any resource management error prior to Treclaim is handled by | ||
| * rescheduling the poll query, unless it is scheduling the poll | ||
| * query itself that fails. | ||
| * | ||
| * 2. If scheduling the poll query fails then the endpoint is removed. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this comment is for when sd_event_source_* fails. It does not applies anymore, so I removed it.
|
From this rough draft, I managed to cut down test time from 20~ seconds to around ~8 seconds. In theory, if every timeout is controlled by That is what reflected in |
|
I think there should be a way to update the deadline for the global timer. Maybe instead of registering the timer callback as the API, a |
1dfa72c to
37586d0
Compare
|
Instead of a custom callback, I changed the interface to mirror what I actually have not tested this on a real mctp system yet, so the real Still, I think this is roughly what I wanted to see, so I will set the PR to Ready for review. |
src/mctpd.c
Outdated
| int global_time_callback(sd_event_source *source, int timerfd, uint revents, | ||
| void *userdata) | ||
| { | ||
| struct ctx *ctx = userdata; | ||
| uint64_t n_expireds; | ||
| size_t i; | ||
|
|
||
| mctp_ops.timerfd.read(timerfd, &n_expireds); | ||
| mctp_ops.timerfd.gettime(timerfd, &ctx->now); | ||
|
|
||
| for (i = 0; i < ctx->num_peers; i++) { | ||
| struct peer *p = ctx->peers[i]; | ||
| if (p->recovery.deadline <= ctx->now) { | ||
| peer_endpoint_recover(p); | ||
| } | ||
| } | ||
|
|
||
| return 0; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Now that we use a timerfd-like interface, do we need to do this anymore?
We should be able to just keep all the peer->recovery.source, and now do I/O sources instead of time sources. This option should keep the diff fairly minimal, but I am not sure about now coalescing timers into one global I/O timer (which sd-event does internally AFAIK).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually went ahead and did it anyway :^) Looks much clearer IMO, we can do this change later if we want.
14f61d5 to
bae46e3
Compare
|
For the record, Previous CI runs: To: |
bae46e3 to
3fa2dd3
Compare
|
I also went ahead and remove the |
|
I can't say I'm super keen on using timerfds. Constraining complexity in the production code takes priority over test implementations, for me. I'll do some experimentation and see if we can mock with the standard sd_event time handling. |
I see, that makes sense. I think I can change the API to that of |
3fa2dd3 to
fa81ea0
Compare
|
Maybe this is what you meant? The current issue is that because |
e8fe015 to
3d78f8f
Compare
|
Terribly sorry for the noise, but in the end, I could not even get it to work 💀 My extremely ugly (Also, is this a bug in the new commit to upload the log artifact? Since I see the upload log step is skipped) |
8b180b8 to
fbea32b
Compare
|
Aha, it was a memory leak when creating the event source in floating mode. CI should be fixed now, hopefully... |
fbea32b to
9213b7b
Compare
9213b7b to
0cd7990
Compare
|
I think this looks better, but would like to see if we can decouple the test suite timer from mctpd's own concept of time. This would mean replacing the trio autojump timer with one that we could explicitly advance in the test cases. The challenge there is syncronisation between the advance and the registration of a new timer though... |
I did not include it inside this PR, but I am pretty sure that we got that for free though. I believe we can just do @pytest.fixture
def manual_clock():
"""
Default clock with manual advance, and no autojump
"""
return trio.testing.MockClock()In tests that use it, you can manually add that fixture to tests and... manual_clock.tick(10)to advance 10 seconds, and all Do you have any tests in mind? I can try to add that to this MR as a proof that it should work. |
|
OK, that looks pretty neat. I did not have anything specific, just that we may need that level of control at some point. I would like to investigate decoupling the trio clock from mctpd's clock, but that could come as a later change. |
|
What's the general plan here? The final HAVE_LIBSYSTEMD change looks like a temporary fix - or is that something you're planning to include as-is? |
|
Ouch sorry, totally forget about that, I intended to ask if there is any better way to achieve that (building |
| executable('mctp', | ||
| sources: ['src/mctp.c'] + netlink_sources + util_sources + ops_sources, | ||
| install: true, | ||
| c_args: ['-DHAVE_LIBSYSTEMD=0'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly feel hesitant here. It feels like I am doing something that configure_file supposed to solve. Maybe a separate config.h for each target?
I'll take a look. Maybe we can add some logic to override for tests only. |
In tests, replace all sd_event timer usages with trio.testing.MockClock backed I/O sources. Because we still have real timeouts (D-Bus call to mctpd subprocess, mctpd SO_RCVTIMEO wait, ...), autojump_threshold (how much to wait before skipping Trio timeouts) is set to a reasonable value to take that into account. Signed-off-by: Khang D Nguyen <[email protected]>
0cd7990 to
5f38f94
Compare
|
Just in case, to reduce back and forth, I squashed the change into one commit. This is my final take |
|
Any thoughts on something like this: jk-ozlabs@e691647 ? |
The name clarification looks good to me. I have no comments on I assume I don't need to pick the commits into this PR because you will merge your branch along with my commit, but let me know if I should do so! |
|
Yep, all good there. Thank you for the contribution! |
In tests, replace all sd_event timer usages with trio.testing.MockClock
backed I/O sources.
Because we still have real timeouts (D-Bus call to mctpd subprocess,
mctpd SO_RCVTIMEO wait, ...), autojump_threshold (how much to wait
before skipping Trio timeouts) is set to a reasonable value to take that
into account.