-
Notifications
You must be signed in to change notification settings - Fork 395
Better explain logcontext in run_in_background(...)
, run_as_background_process(...)
, and the sentinel
logcontext
#18900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d3f4391
8de6a86
af1770d
064ca82
108e893
3cf60d7
21f8fcc
53c5ce7
8b1be4e
9cd37e0
2194f6f
0c4fa2a
3d7684e
5794e8f
0fb7460
d35a9ff
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Better explain how we manage the logcontext in `run_in_background(...)` and `run_as_background_process(...)`. | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -227,7 +227,16 @@ class ContextRequest: | |
|
||
|
||
class _Sentinel: | ||
"""Sentinel to represent the root context""" | ||
""" | ||
Sentinel to represent the root context | ||
|
||
This should only be used for tasks outside of Synapse like when we yield control | ||
back to the Twisted reactor (event loop) so we don't leak the current logging | ||
context to other tasks that are scheduled next in the event loop. | ||
|
||
Nothing from the Synapse homeserver should be logged with the sentinel context. i.e. | ||
we should always know which server the logs are coming from. | ||
Comment on lines
+237
to
+238
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is aspirational since we currently go against this in a few places. This is being addressed in #18870 |
||
""" | ||
|
||
__slots__ = ["previous_context", "finished", "request", "tag"] | ||
|
||
|
@@ -616,9 +625,17 @@ def filter(self, record: logging.LogRecord) -> Literal[True]: | |
|
||
|
||
class PreserveLoggingContext: | ||
"""Context manager which replaces the logging context | ||
""" | ||
Context manager which replaces the logging context | ||
|
||
The previous logging context is restored on exit. | ||
|
||
The previous logging context is restored on exit.""" | ||
`make_deferred_yieldable` is pretty equivalent to using `with | ||
PreserveLoggingContext():` (using the default sentinel context), i.e. it clears the | ||
logcontext before awaiting (and so before execution passes back to the reactor) and | ||
restores the old context once the awaitable completes (execution passes from the | ||
reactor back to the code). | ||
""" | ||
|
||
__slots__ = ["_old_context", "_new_context"] | ||
|
||
|
@@ -784,6 +801,14 @@ def run_in_background( | |
return from the function, and that the sentinel context is set once the | ||
deferred returned by the function completes. | ||
MadLittleMods marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Is this distinction actually true? It seems like both can used to wait for a result. Both need to be used with
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The difference is:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the caller, they interact the same (as far as I can tell):
The only difference is internal as you pointed out:
But I think this already answers the question in this thread that the off the cuff comment on the distinction wasn't accurate ⏩ |
||
To explain how the log contexts work here: | ||
- When this function is called, the current context is stored ("original"), we kick | ||
off the background task, and we restore that original context before returning | ||
- When the background task finishes, we don't want to leak our context into the | ||
reactor which would erroneously get attached to the next operation picked up by | ||
the event loop. We add a callback to the deferred which will clear the logging | ||
context after it finishes and yields control back to the reactor. | ||
|
||
Useful for wrapping functions that return a deferred or coroutine, which you don't | ||
yield or await on (for instance because you want to pass it to | ||
deferred.gatherResults()). | ||
|
@@ -795,8 +820,13 @@ def run_in_background( | |
`f` doesn't raise any deferred exceptions, otherwise a scary-looking | ||
CRITICAL error about an unhandled error will be logged without much | ||
indication about where it came from. | ||
|
||
Returns: | ||
Deferred which returns the result of func, or `None` if func raises. | ||
Note that the returned Deferred does not follow the synapse logcontext | ||
rules. | ||
""" | ||
current = current_context() | ||
calling_context = current_context() | ||
try: | ||
res = f(*args, **kwargs) | ||
except Exception: | ||
|
@@ -806,6 +836,9 @@ def run_in_background( | |
|
||
# `res` may be a coroutine, `Deferred`, some other kind of awaitable, or a plain | ||
# value. Convert it to a `Deferred`. | ||
# | ||
# Wrapping the value in a deferred has the side effect of executing the coroutine, | ||
# if it is one. If it's already a deferred, then we can just use that. | ||
d: "defer.Deferred[R]" | ||
if isinstance(res, typing.Coroutine): | ||
# Wrap the coroutine in a `Deferred`. | ||
|
@@ -820,28 +853,32 @@ def run_in_background( | |
# `res` is a plain value. Wrap it in a `Deferred`. | ||
d = defer.succeed(res) | ||
|
||
# The deferred has already completed | ||
if d.called and not d.paused: | ||
# The function should have maintained the logcontext, so we can | ||
# optimise out the messing about | ||
return d | ||
|
||
# The function may have reset the context before returning, so | ||
# we need to restore it now. | ||
ctx = set_current_context(current) | ||
|
||
# The original context will be restored when the deferred | ||
# completes, but there is nothing waiting for it, so it will | ||
# get leaked into the reactor or some other function which | ||
# wasn't expecting it. We therefore need to reset the context | ||
# here. | ||
# The function may have reset the context before returning, so we need to restore it | ||
# now. | ||
# | ||
# Our goal is to have the caller logcontext unchanged after firing off the | ||
# background task and returning. | ||
set_current_context(calling_context) | ||
|
||
# The original logcontext will be restored when the deferred completes, but | ||
# there is nothing waiting for it, so it will get leaked into the reactor (which | ||
# would then get picked up by the next thing the reactor does). We therefore | ||
# need to reset the logcontext here (set the `sentinel` logcontext) before | ||
# yielding control back to the reactor. | ||
# | ||
# (If this feels asymmetric, consider it this way: we are | ||
# effectively forking a new thread of execution. We are | ||
# probably currently within a ``with LoggingContext()`` block, | ||
# which is supposed to have a single entry and exit point. But | ||
# by spawning off another deferred, we are effectively | ||
# adding a new exit point.) | ||
d.addBoth(_set_context_cb, ctx) | ||
d.addBoth(_set_context_cb, SENTINEL_CONTEXT) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can you come up with a reason why we were previously doing the trying to restore some other context? This was introduced in matrix-org/synapse#3170 without an explanation around this specific piece. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmhmhmhmhmhm, I wonder if there is some awfulness with deferred chaining where the execution doesn't actually go back to the reactor but back to some other context. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have chained deferred tests that still pass from that PR 🤷 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To give this an update with more hindsight; this logic was redundant with the early return and it is safe to remove this complexity ✅ It seems like this actually has to do with completed vs incomplete deferreds... To explain how things previously worked without the early-return shortcut: With the normal case of incomplete awaitable, we store the With the other case where we see a completed awaitable, we store the But this also means that our early-return shortcut is no longer just an optimization and is necessary to act correctly in the completed awaitable case as we want to return with the But because we made the same change to |
||
return d | ||
|
||
|
||
|
@@ -859,53 +896,86 @@ def run_coroutine_in_background( | |
coroutine directly rather than a function. We can do this because coroutines | ||
do not run until called, and so calling an async function without awaiting | ||
cannot change the log contexts. | ||
|
||
This is an ergonomic helper so we can do this: | ||
```python | ||
run_coroutine_in_background(func1(arg1)) | ||
``` | ||
Rather than having to do this: | ||
```python | ||
run_in_background(lambda: func1(arg1)) | ||
``` | ||
""" | ||
calling_context = current_context() | ||
|
||
current = current_context() | ||
# Wrap the coroutine in a deferred, which will have the side effect of executing the | ||
# coroutine in the background. | ||
d = defer.ensureDeferred(coroutine) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Re: This was introduced in #17884 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it makes it a bit more ergonomic to use the await gather_optional_coroutines(
func1(arg1), func2(arg2), func3(arg3),
) which otherwise you'd need to do something like: await gather_optional(
lambda: func1(arg1), lambda: func2(arg2), lambda: func3(arg3) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, ergonomic helper 👍 Updated the docstring to explain this ✅ |
||
# The function may have reset the context before returning, so | ||
# we need to restore it now. | ||
ctx = set_current_context(current) | ||
|
||
# The original context will be restored when the deferred | ||
# completes, but there is nothing waiting for it, so it will | ||
# get leaked into the reactor or some other function which | ||
# wasn't expecting it. We therefore need to reset the context | ||
# here. | ||
# The function may have reset the context before returning, so we need to restore it | ||
# now. | ||
# | ||
# Our goal is to have the caller logcontext unchanged after firing off the | ||
# background task and returning. | ||
set_current_context(calling_context) | ||
|
||
# The original logcontext will be restored when the deferred completes, but | ||
# there is nothing waiting for it, so it will get leaked into the reactor (which | ||
# would then get picked up by the next thing the reactor does). We therefore | ||
# need to reset the logcontext here (set the `sentinel` logcontext) before | ||
# yielding control back to the reactor. | ||
# | ||
# (If this feels asymmetric, consider it this way: we are | ||
# effectively forking a new thread of execution. We are | ||
# probably currently within a ``with LoggingContext()`` block, | ||
# which is supposed to have a single entry and exit point. But | ||
# by spawning off another deferred, we are effectively | ||
# adding a new exit point.) | ||
d.addBoth(_set_context_cb, ctx) | ||
d.addBoth(_set_context_cb, SENTINEL_CONTEXT) | ||
return d | ||
|
||
|
||
T = TypeVar("T") | ||
|
||
|
||
def make_deferred_yieldable(deferred: "defer.Deferred[T]") -> "defer.Deferred[T]": | ||
MadLittleMods marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Given a deferred, make it follow the Synapse logcontext rules: | ||
|
||
If the deferred has completed, essentially does nothing (just returns another | ||
completed deferred with the result/failure). | ||
|
||
If the deferred has not yet completed, resets the logcontext before | ||
returning a deferred. Then, when the deferred completes, restores the | ||
current logcontext before running callbacks/errbacks. | ||
|
||
(This is more-or-less the opposite operation to run_in_background.) | ||
""" | ||
Given a deferred, make it follow the Synapse logcontext rules: | ||
|
||
- If the deferred has completed, essentially does nothing (just returns another | ||
completed deferred with the result/failure). | ||
- If the deferred has not yet completed, resets the logcontext before returning a | ||
incomplete deferred. Then, when the deferred completes, restores the current | ||
logcontext before running callbacks/errbacks. | ||
|
||
This means the resultant deferred can be awaited without leaking the current | ||
logcontext to the reactor (which would then get erroneously picked up by the next | ||
thing the reactor does), and also means that the logcontext is preserved when the | ||
deferred completes. | ||
|
||
(This is more-or-less the opposite operation to run_in_background in terms of how it | ||
handles log contexts.) | ||
|
||
Pretty much equivalent to using `with PreserveLoggingContext():`, i.e. it clears the | ||
logcontext before awaiting (and so before execution passes back to the reactor) and | ||
restores the old context once the awaitable completes (execution passes from the | ||
reactor back to the code). | ||
""" | ||
# The deferred has already completed | ||
if deferred.called and not deferred.paused: | ||
# it looks like this deferred is ready to run any callbacks we give it | ||
# immediately. We may as well optimise out the logcontext faffery. | ||
return deferred | ||
|
||
# ok, we can't be sure that a yield won't block, so let's reset the | ||
# logcontext, and add a callback to the deferred to restore it. | ||
# Our goal is to have the caller logcontext unchanged after they yield/await the | ||
# returned deferred. | ||
# | ||
# When the caller yield/await's the returned deferred, it may yield | ||
# control back to the reactor. To avoid leaking the current logcontext to the | ||
# reactor (which would then get erroneously picked up by the next thing the reactor | ||
# does) while the deferred runs in the reactor event loop, we reset the logcontext | ||
# and add a callback to the deferred to restore it so the caller's logcontext is | ||
# active when the deferred completes. | ||
Comment on lines
+973
to
+978
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this more accurate than the previous explanation? The "
I feel like this may be irrelevant to what I think is actually happening (as explained now) 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that is related to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel like that detail might not even matter compared to this other reason why we would set That detail might still be good to also include here but I have no confidence even with this scenario explanation and feels like the "why" half is missing which is what I would want filled in. |
||
prev_context = set_current_context(SENTINEL_CONTEXT) | ||
deferred.addBoth(_set_context_cb, prev_context) | ||
return deferred | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -228,6 +228,11 @@ def run_as_background_process( | |
clock.looping_call and friends (or for firing-and-forgetting in the middle of a | ||
normal synapse async function). | ||
|
||
Because the returned Deferred does not follow the synapse logcontext rules, awaiting | ||
the result of this function will result in the log context being cleared (bad). In | ||
order to properly await the result of this function and maintain the current log | ||
context, use `make_deferred_yieldable`. | ||
|
||
Args: | ||
desc: a description for this background process type | ||
server_name: The homeserver name that this background process is being run for | ||
|
@@ -280,6 +285,18 @@ async def run() -> Optional[R]: | |
name=desc, **{SERVER_NAME_LABEL: server_name} | ||
).dec() | ||
|
||
# To explain how the log contexts work here: | ||
# - When this function is called, the current context is stored (using | ||
# `PreserveLoggingContext`), we kick off the background task, and we restore the | ||
# original context before returning (also part of `PreserveLoggingContext`). | ||
# - When the background task finishes, we don't want to leak our background context | ||
# into the reactor which would erroneously get attached to the next operation | ||
# picked up by the event loop. We use `PreserveLoggingContext` to set the | ||
# `sentinel` context and means the new `BackgroundProcessLoggingContext` will | ||
# remember the `sentinel` context as its previous context to return to when it | ||
# exits and yields control back to the reactor. | ||
# | ||
# TODO: Why can't we simplify to using `return run_in_background(run)`? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like to explore this further but I haven't come up with an answer yet. Currently, some tests fail when we try this -> #18900 (comment) It probably directly relates to #18900 (comment) I think this PR is a good step forward and I'd rather leave the note in until we either comment "why" or simplify. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think its likely due to |
||
with PreserveLoggingContext(): | ||
# Note that we return a Deferred here so that it can be used in a | ||
# looping_call and other places that expect a Deferred. | ||
|
Uh oh!
There was an error while loading. Please reload this page.