Skip to content

Commit 9cc4001

Browse files
Better explain logcontext in run_in_background(...) and run_as_background_process(...) (#18900)
Also adds a section in the docs explaining the `sentinel` logcontext. Spawning from #18870 ### Testing strategy 1. Run Synapse normally and with `daemonize: true`: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Execute some requests 1. Shutdown the server 1. Look for any bad log entries in your homeserver logs: - `Expected logging context sentinel but found main` - `Expected logging context main was lost` - `Expected previous context` - `utime went backwards!`/`stime went backwards!` - `Called stop on logcontext POST-0 without recording a start rusage` - `Background process re-entered without a proc` Twisted trial tests: 1. Run full Twisted trial test suite. 1. Check the logs for `Test starting with non-sentinel logging context ...`
1 parent c68c5dd commit 9cc4001

File tree

4 files changed

+147
-37
lines changed

4 files changed

+147
-37
lines changed

changelog.d/18900.misc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Better explain how we manage the logcontext in `run_in_background(...)` and `run_as_background_process(...)`.

docs/log_contexts.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,28 @@ def do_request_handling():
5959
logger.debug("phew")
6060
```
6161

62+
### The `sentinel` context
63+
64+
The default logcontext is `synapse.logging.context.SENTINEL_CONTEXT`, which is an empty
65+
sentinel value to represent the root logcontext. This is what is used when there is no
66+
other logcontext set. The phrase "clear/reset the logcontext" means to set the current
67+
logcontext to the `sentinel` logcontext.
68+
69+
No CPU/database usage metrics are recorded against the `sentinel` logcontext.
70+
71+
Ideally, nothing from the Synapse homeserver would be logged against the `sentinel`
72+
logcontext as we want to know which server the logs came from. In practice, this is not
73+
always the case yet especially outside of request handling.
74+
75+
Global things outside of Synapse (e.g. Twisted reactor code) should run in the
76+
`sentinel` logcontext. It's only when it calls into application code that a logcontext
77+
gets activated. This means the reactor should be started in the `sentinel` logcontext,
78+
and any time an awaitable yields control back to the reactor, it should reset the
79+
logcontext to be the `sentinel` logcontext. This is important to avoid leaking the
80+
current logcontext to the reactor (which would then get picked up and associated with
81+
the next thing the reactor does).
82+
83+
6284
## Using logcontexts with awaitables
6385

6486
Awaitables break the linear flow of code so that there is no longer a single entry point

synapse/logging/context.py

Lines changed: 107 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,16 @@ class ContextRequest:
227227

228228

229229
class _Sentinel:
230-
"""Sentinel to represent the root context"""
230+
"""
231+
Sentinel to represent the root context
232+
233+
This should only be used for tasks outside of Synapse like when we yield control
234+
back to the Twisted reactor (event loop) so we don't leak the current logging
235+
context to other tasks that are scheduled next in the event loop.
236+
237+
Nothing from the Synapse homeserver should be logged with the sentinel context. i.e.
238+
we should always know which server the logs are coming from.
239+
"""
231240

232241
__slots__ = ["previous_context", "finished", "request", "tag"]
233242

@@ -616,9 +625,17 @@ def filter(self, record: logging.LogRecord) -> Literal[True]:
616625

617626

618627
class PreserveLoggingContext:
619-
"""Context manager which replaces the logging context
628+
"""
629+
Context manager which replaces the logging context
630+
631+
The previous logging context is restored on exit.
620632
621-
The previous logging context is restored on exit."""
633+
`make_deferred_yieldable` is pretty equivalent to using `with
634+
PreserveLoggingContext():` (using the default sentinel context), i.e. it clears the
635+
logcontext before awaiting (and so before execution passes back to the reactor) and
636+
restores the old context once the awaitable completes (execution passes from the
637+
reactor back to the code).
638+
"""
622639

623640
__slots__ = ["_old_context", "_new_context"]
624641

@@ -784,6 +801,14 @@ def run_in_background(
784801
return from the function, and that the sentinel context is set once the
785802
deferred returned by the function completes.
786803
804+
To explain how the log contexts work here:
805+
- When this function is called, the current context is stored ("original"), we kick
806+
off the background task, and we restore that original context before returning
807+
- When the background task finishes, we don't want to leak our context into the
808+
reactor which would erroneously get attached to the next operation picked up by
809+
the event loop. We add a callback to the deferred which will clear the logging
810+
context after it finishes and yields control back to the reactor.
811+
787812
Useful for wrapping functions that return a deferred or coroutine, which you don't
788813
yield or await on (for instance because you want to pass it to
789814
deferred.gatherResults()).
@@ -795,8 +820,13 @@ def run_in_background(
795820
`f` doesn't raise any deferred exceptions, otherwise a scary-looking
796821
CRITICAL error about an unhandled error will be logged without much
797822
indication about where it came from.
823+
824+
Returns:
825+
Deferred which returns the result of func, or `None` if func raises.
826+
Note that the returned Deferred does not follow the synapse logcontext
827+
rules.
798828
"""
799-
current = current_context()
829+
calling_context = current_context()
800830
try:
801831
res = f(*args, **kwargs)
802832
except Exception:
@@ -806,6 +836,9 @@ def run_in_background(
806836

807837
# `res` may be a coroutine, `Deferred`, some other kind of awaitable, or a plain
808838
# value. Convert it to a `Deferred`.
839+
#
840+
# Wrapping the value in a deferred has the side effect of executing the coroutine,
841+
# if it is one. If it's already a deferred, then we can just use that.
809842
d: "defer.Deferred[R]"
810843
if isinstance(res, typing.Coroutine):
811844
# Wrap the coroutine in a `Deferred`.
@@ -820,28 +853,32 @@ def run_in_background(
820853
# `res` is a plain value. Wrap it in a `Deferred`.
821854
d = defer.succeed(res)
822855

856+
# The deferred has already completed
823857
if d.called and not d.paused:
824858
# The function should have maintained the logcontext, so we can
825859
# optimise out the messing about
826860
return d
827861

828-
# The function may have reset the context before returning, so
829-
# we need to restore it now.
830-
ctx = set_current_context(current)
831-
832-
# The original context will be restored when the deferred
833-
# completes, but there is nothing waiting for it, so it will
834-
# get leaked into the reactor or some other function which
835-
# wasn't expecting it. We therefore need to reset the context
836-
# here.
862+
# The function may have reset the context before returning, so we need to restore it
863+
# now.
864+
#
865+
# Our goal is to have the caller logcontext unchanged after firing off the
866+
# background task and returning.
867+
set_current_context(calling_context)
868+
869+
# The original logcontext will be restored when the deferred completes, but
870+
# there is nothing waiting for it, so it will get leaked into the reactor (which
871+
# would then get picked up by the next thing the reactor does). We therefore
872+
# need to reset the logcontext here (set the `sentinel` logcontext) before
873+
# yielding control back to the reactor.
837874
#
838875
# (If this feels asymmetric, consider it this way: we are
839876
# effectively forking a new thread of execution. We are
840877
# probably currently within a ``with LoggingContext()`` block,
841878
# which is supposed to have a single entry and exit point. But
842879
# by spawning off another deferred, we are effectively
843880
# adding a new exit point.)
844-
d.addBoth(_set_context_cb, ctx)
881+
d.addBoth(_set_context_cb, SENTINEL_CONTEXT)
845882
return d
846883

847884

@@ -859,53 +896,86 @@ def run_coroutine_in_background(
859896
coroutine directly rather than a function. We can do this because coroutines
860897
do not run until called, and so calling an async function without awaiting
861898
cannot change the log contexts.
899+
900+
This is an ergonomic helper so we can do this:
901+
```python
902+
run_coroutine_in_background(func1(arg1))
903+
```
904+
Rather than having to do this:
905+
```python
906+
run_in_background(lambda: func1(arg1))
907+
```
862908
"""
909+
calling_context = current_context()
863910

864-
current = current_context()
911+
# Wrap the coroutine in a deferred, which will have the side effect of executing the
912+
# coroutine in the background.
865913
d = defer.ensureDeferred(coroutine)
866914

867-
# The function may have reset the context before returning, so
868-
# we need to restore it now.
869-
ctx = set_current_context(current)
870-
871-
# The original context will be restored when the deferred
872-
# completes, but there is nothing waiting for it, so it will
873-
# get leaked into the reactor or some other function which
874-
# wasn't expecting it. We therefore need to reset the context
875-
# here.
915+
# The function may have reset the context before returning, so we need to restore it
916+
# now.
917+
#
918+
# Our goal is to have the caller logcontext unchanged after firing off the
919+
# background task and returning.
920+
set_current_context(calling_context)
921+
922+
# The original logcontext will be restored when the deferred completes, but
923+
# there is nothing waiting for it, so it will get leaked into the reactor (which
924+
# would then get picked up by the next thing the reactor does). We therefore
925+
# need to reset the logcontext here (set the `sentinel` logcontext) before
926+
# yielding control back to the reactor.
876927
#
877928
# (If this feels asymmetric, consider it this way: we are
878929
# effectively forking a new thread of execution. We are
879930
# probably currently within a ``with LoggingContext()`` block,
880931
# which is supposed to have a single entry and exit point. But
881932
# by spawning off another deferred, we are effectively
882933
# adding a new exit point.)
883-
d.addBoth(_set_context_cb, ctx)
934+
d.addBoth(_set_context_cb, SENTINEL_CONTEXT)
884935
return d
885936

886937

887938
T = TypeVar("T")
888939

889940

890941
def make_deferred_yieldable(deferred: "defer.Deferred[T]") -> "defer.Deferred[T]":
891-
"""Given a deferred, make it follow the Synapse logcontext rules:
892-
893-
If the deferred has completed, essentially does nothing (just returns another
894-
completed deferred with the result/failure).
895-
896-
If the deferred has not yet completed, resets the logcontext before
897-
returning a deferred. Then, when the deferred completes, restores the
898-
current logcontext before running callbacks/errbacks.
899-
900-
(This is more-or-less the opposite operation to run_in_background.)
901942
"""
943+
Given a deferred, make it follow the Synapse logcontext rules:
944+
945+
- If the deferred has completed, essentially does nothing (just returns another
946+
completed deferred with the result/failure).
947+
- If the deferred has not yet completed, resets the logcontext before returning a
948+
incomplete deferred. Then, when the deferred completes, restores the current
949+
logcontext before running callbacks/errbacks.
950+
951+
This means the resultant deferred can be awaited without leaking the current
952+
logcontext to the reactor (which would then get erroneously picked up by the next
953+
thing the reactor does), and also means that the logcontext is preserved when the
954+
deferred completes.
955+
956+
(This is more-or-less the opposite operation to run_in_background in terms of how it
957+
handles log contexts.)
958+
959+
Pretty much equivalent to using `with PreserveLoggingContext():`, i.e. it clears the
960+
logcontext before awaiting (and so before execution passes back to the reactor) and
961+
restores the old context once the awaitable completes (execution passes from the
962+
reactor back to the code).
963+
"""
964+
# The deferred has already completed
902965
if deferred.called and not deferred.paused:
903966
# it looks like this deferred is ready to run any callbacks we give it
904967
# immediately. We may as well optimise out the logcontext faffery.
905968
return deferred
906969

907-
# ok, we can't be sure that a yield won't block, so let's reset the
908-
# logcontext, and add a callback to the deferred to restore it.
970+
# Our goal is to have the caller logcontext unchanged after they yield/await the
971+
# returned deferred.
972+
#
973+
# When the caller yield/await's the returned deferred, it may yield
974+
# control back to the reactor. To avoid leaking the current logcontext to the
975+
# reactor (which would then get erroneously picked up by the next thing the reactor
976+
# does) while the deferred runs in the reactor event loop, we reset the logcontext
977+
# and add a callback to the deferred to restore it so the caller's logcontext is
978+
# active when the deferred completes.
909979
prev_context = set_current_context(SENTINEL_CONTEXT)
910980
deferred.addBoth(_set_context_cb, prev_context)
911981
return deferred

synapse/metrics/background_process_metrics.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,11 @@ def run_as_background_process(
228228
clock.looping_call and friends (or for firing-and-forgetting in the middle of a
229229
normal synapse async function).
230230
231+
Because the returned Deferred does not follow the synapse logcontext rules, awaiting
232+
the result of this function will result in the log context being cleared (bad). In
233+
order to properly await the result of this function and maintain the current log
234+
context, use `make_deferred_yieldable`.
235+
231236
Args:
232237
desc: a description for this background process type
233238
server_name: The homeserver name that this background process is being run for
@@ -280,6 +285,18 @@ async def run() -> Optional[R]:
280285
name=desc, **{SERVER_NAME_LABEL: server_name}
281286
).dec()
282287

288+
# To explain how the log contexts work here:
289+
# - When this function is called, the current context is stored (using
290+
# `PreserveLoggingContext`), we kick off the background task, and we restore the
291+
# original context before returning (also part of `PreserveLoggingContext`).
292+
# - When the background task finishes, we don't want to leak our background context
293+
# into the reactor which would erroneously get attached to the next operation
294+
# picked up by the event loop. We use `PreserveLoggingContext` to set the
295+
# `sentinel` context and means the new `BackgroundProcessLoggingContext` will
296+
# remember the `sentinel` context as its previous context to return to when it
297+
# exits and yields control back to the reactor.
298+
#
299+
# TODO: Why can't we simplify to using `return run_in_background(run)`?
283300
with PreserveLoggingContext():
284301
# Note that we return a Deferred here so that it can be used in a
285302
# looping_call and other places that expect a Deferred.

0 commit comments

Comments
 (0)