-
Notifications
You must be signed in to change notification settings - Fork 392
Fix server_name
in logging context for multiple Synapse instances in one process
#18868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix server_name
in logging context for multiple Synapse instances in one process
#18868
Conversation
This allows us to get access to `server_name` before which we may want to use in the `with LoggingContext("main"):` call early on. This also allows us more flexibility to parse config however we want and setup a Synapse homeserver. Like what we do in Synapse Pro for Small Hosts.
There are no instances where we don't provide it either
server_name
to loggingserver_name
in logging context for multiple Synapse instances in one process
We will need this in #18868 but not for now
#18870) Remove `sentinel` logcontext where we log in `setup`, `start`, and exit. Instead of having one giant PR that removes all places we use `sentinel` logcontext, I've decided to tackle this more piece-meal. This PR covers the parts if you just startup Synapse and exit it with no requests or activity going on in between. Part of #18905 (Remove `sentinel` logcontext where we log in Synapse) Prerequisite for #18868. Logging with the `sentinel` logcontext means we won't know which server the log came from. ### Why https://github.com/element-hq/synapse/blob/9cc400177822805e2a08d4d934daad6f3bc2a4df/docs/log_contexts.md#L71-L81 (docs updated in #18900) ### Testing strategy 1. Run Synapse normally and with `daemonize: true`: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Execute some requests 1. Shutdown the server 1. Look for any bad log entries in your homeserver logs: - `Expected logging context sentinel but found main` - `Expected logging context main was lost` - `Expected previous context` - `utime went backwards!`/`stime went backwards!` - `Called stop on logcontext POST-0 without recording a start rusage` 1. Look for any logs coming from the `sentinel` context With these changes, you should only see the following logs (not from Synapse) using the `sentinel` context if you start up Synapse and exit: `homeserver.log` ``` 2025-09-10 14:45:39,924 - asyncio - 64 - DEBUG - sentinel - Using selector: EpollSelector 2025-09-10 14:45:40,562 - twisted - 281 - INFO - sentinel - Received SIGINT, shutting down. 2025-09-10 14:45:40,562 - twisted - 281 - INFO - sentinel - (TCP Port 9322 Closed) 2025-09-10 14:45:40,563 - twisted - 281 - INFO - sentinel - (TCP Port 8008 Closed) 2025-09-10 14:45:40,563 - twisted - 281 - INFO - sentinel - (TCP Port 9093 Closed) 2025-09-10 14:45:40,564 - twisted - 281 - INFO - sentinel - Main loop terminated. ```
This allows us to get access to `server_name` so we can use it when creating the `LoggingContext("main")` in the future (pre-requisite for #18868). This also allows us more flexibility to parse config however we want and setup a Synapse homeserver. Like what we do in [Synapse Pro for Small Hosts](https://github.com/element-hq/synapse-small-hosts). Split out from #18868
Conflicts: synapse/_scripts/synapse_port_db.py synapse/app/admin_cmd.py synapse/app/appservice.py synapse/app/client_reader.py synapse/app/event_creator.py synapse/app/federation_reader.py synapse/app/federation_sender.py synapse/app/frontend_proxy.py synapse/app/generic_worker.py synapse/app/homeserver.py synapse/app/media_repository.py synapse/app/pusher.py synapse/app/synchrotron.py synapse/app/user_dir.py synapse/logging/context.py tests/handlers/test_federation.py tests/util/test_logcontext.py
(downstream hasn't been updated yet)
Fix #12841 (originally matrix-org/synapse#12841)
def __init__(self, request: str = ""): | ||
self._default_request = request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We previously recommended adding this to your logging config and became no longer necessary after matrix-org/synapse#8051 although was just redundant and harmless if you still did.
filters:
context:
(): synapse.logging.context.LoggingContextFilter
request: ""
If we remove this __init__
constructor, we then start to see exceptions like #18868 (comment) so we should consider just reverting and putting it back in place for now.
Since it's been 5 years since we started adding LoggingContextFilter
automatically for people, we could consider making a breaking change but it would be best to do this in a separate PR to track the decision easier.
I only tried to remove it because it seemed useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restored LoggingContextFilter.__init__
constructor for now ⏩
I've also added all of the history context in a comment
fabf85e
to
44fa84f
Compare
@@ -0,0 +1 @@ | |||
Fix `server_name` in logging context for multiple Synapse instances in one process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Progressed outside of this PR: We have removed enough sentinel
logcontext usage in Synapse for this to be useful (tracked by #18905).
With this PR, we will be able to distinguish which server sent the logs wherever we're not using sentinel
logcontext. And if we find any more sentinel
logcontext usage, we can be fix it up piecemeal like normal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looked like a scary PR numbers-wise but was actually pretty approachable since it was almost entirely just straightforward plumbing; a real gold star for structuring your PR sequence to make this possible, so thanks!
def __init__(self, request: str = ""): | ||
def __init__( | ||
self, | ||
# `request` is here for backwards compatibility since we previously recommended |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice comment, thanks :)
clock = Clock(cast(ISynapseThreadlessReactor, reactor)) | ||
clock = Clock( | ||
cast(ISynapseThreadlessReactor, reactor), | ||
server_name="synapse_module_running_from_unknown_server", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to track which server this is running on? Such information would be useful if a particular module is misbehaving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we could do a pattern like the following today because we expose ModuleApi.server_name()
, I'd prefer not to make a further mess.
Modified DirectServeJsonResource
class DirectServeJsonResource(_AsyncResource):
"""A resource that will call `self._async_on_<METHOD>` on new requests,
formatting responses and errors as JSON.
"""
def __init__(
self,
canonical_json: bool = False,
extract_context: bool = False,
# Clock is optional as this class is exposed to the module API.
clock: Optional[Clock] = None,
# This is only necessary for Module API users who don't pass in a `clock`.
server_name: Optional[str] = None,
):
"""
Args:
canonical_json: TODO
extract_context: TODO
clock: This is expected to be passed in by any Synapse code.
Only optional for the Module API.
server_name: The homeserver name (this should be `ModuleApi.server_name()`).
Only used for the Module API if `clock` is not passed in.
"""
if clock is None:
clock = Clock(
cast(ISynapseThreadlessReactor, reactor),
server_name=server_name
if server_name is not None
else "synapse_module_running_from_unknown_server",
)
else:
assert server_name is None, (
"No need to pass in `server_name` if clock is set"
)
super().__init__(clock, extract_context)
self.canonical_json = canonical_json
Usage:
class SomeResource(DirectServeJsonResource):
def __init__(
self,
module_api: ModuleApi,
):
super().__init__(server_name=module_api.server_name())
class SomeSynapseModule:
"""
Synapse module that TODO
"""
def __init__(
self, config: MyModuleConfig, module_api: ModuleApi
):
# Keep a reference to the config and Module API
self._module_api = module_api
# "Modules **must** register their web resources in their `__init__` method."
# (https://github.com/element-hq/synapse/blob/081f6ad50fa0ea87c348778e8be40517da25c698/docs/modules/writing_a_module.md#L69)
self._module_api.register_web_resource(
SomeResource(
self._module_api
),
)
The ideal solution would be using the hs.get_clock()
and passing in the Clock
directly. See #18868 (comment) for why the homeserver clock is ideal. We currently don't expose Clock
to module API consumers and maybe we shouldn't as it's best for them to continue using the dedicated ModuleApi
equivalents.
Overall, this requires much more thought. And perhaps handing out less raw interfaces like DirectServeJsonResource
in the module API as it makes doing the right thing like this practically impossible without making breaking changes.
This hasn't been necessary for 5 years since matrix-org/synapse#8051 because we automatically configure this for you within Synapse itself. Spawning from seeing this fail after we tried to change the `LoggingContextFilter` constructor in element-hq/synapse#18868 (comment). Although this served as a decent canary of what people may have historically configured, I've now added the relevant context to that part of the code as part of element-hq/synapse#18868
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanations. Agreed that this should be a lesson for future module API expansion.
This PR LGTM!
real_clock = Clock(cast(ISynapseThreadlessReactor, reactor)) | ||
real_clock = Clock( | ||
cast(ISynapseThreadlessReactor, reactor), server_name=server_name | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the resolved conversation in the Linearizer
section, wouldn't that apply here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you're talking about #18868 (comment)
Yes, ideally we'd use the homeserver Clock
👍
We already had access to server_name
here and clock
wasn't being passed in at all like the Linearizer
had.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Thanks 🙏
Conflicts: tests/util/test_async_helpers.py
Thanks for the review @reivilibre, @anoadragon453, and @jason-famedly 🐉 |
Background
As part of Element's plan to support a light form of vhosting (virtual host) (multiple instances of Synapse in the same Python process), we're currently diving into the details and implications of running multiple instances of Synapse in the same Python process.
"Per-tenant logging" tracked internally by https://github.com/element-hq/synapse-small-hosts/issues/48
Prior art
Previously, we exposed
server_name
by providing a static loggingMetadataFilter
that injected the values:synapse/synapse/config/logger.py
Line 216 in 205d9e4
While this can work fine for the normal case of one Synapse instance per Python process, this configures things globally and isn't compatible when we try to start multiple Synapse instances because each subsequent tenant will overwrite the previous tenant.
What does this PR do?
We remove the
MetadataFilter
and replace it by tracking theserver_name
in theLoggingContext
and expose it with our existingLoggingContextFilter
that we already use to expose information about therequest
.This means that the
server_name
value follows wherever we log as expected even when we have multiple Synapse instances running in the same process.A note on logcontext
Anywhere, Synapse mistakenly uses the
sentinel
logcontext to log something, we won't know which server sent the log. We've been fixing upsentinel
logcontext usage as tracked by #18905Any further
sentinel
logcontext usage we find in the future can be fixed piecemeal as normal.synapse/docs/log_contexts.md
Lines 71 to 81 in d2a966f
Testing strategy
%(server_name)s
in the formatpoetry run synapse_homeserver --config-path homeserver.yaml
curl http://localhost:8008/_matrix/client/versions
, etc)server_name
in the logs as expected.unknown_server_from_sentinel_context
is expected for thesentinel
logcontext (things outside of Synapse).Dev notes
DirectServeJsonResource
#18600Linearizer
uses a different reactor than the homeserver during tests matrix-org/synapse#12841LoggingContextFilter
) Auto set logging filter matrix-org/synapse#8051Todo
sentinel
logcontext usage (unknown_server_from_sentinel_context
), Removesentinel
logcontext where we log in Synapse #18905Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.