Skip to content

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Aug 25, 2025

Background

As part of Element's plan to support a light form of vhosting (virtual host) (multiple instances of Synapse in the same Python process), we're currently diving into the details and implications of running multiple instances of Synapse in the same Python process.

"Per-tenant logging" tracked internally by https://github.com/element-hq/synapse-small-hosts/issues/48

Prior art

Previously, we exposed server_name by providing a static logging MetadataFilter that injected the values:

log_metadata_filter = MetadataFilter({"server_name": config.server.server_name})

While this can work fine for the normal case of one Synapse instance per Python process, this configures things globally and isn't compatible when we try to start multiple Synapse instances because each subsequent tenant will overwrite the previous tenant.

What does this PR do?

We remove the MetadataFilter and replace it by tracking the server_name in the LoggingContext and expose it with our existing LoggingContextFilter that we already use to expose information about the request.

This means that the server_name value follows wherever we log as expected even when we have multiple Synapse instances running in the same process.

A note on logcontext

Anywhere, Synapse mistakenly uses the sentinel logcontext to log something, we won't know which server sent the log. We've been fixing up sentinel logcontext usage as tracked by #18905

Any further sentinel logcontext usage we find in the future can be fixed piecemeal as normal.

Ideally, nothing from the Synapse homeserver would be logged against the `sentinel`
logcontext as we want to know which server the logs came from. In practice, this is not
always the case yet especially outside of request handling.
Global things outside of Synapse (e.g. Twisted reactor code) should run in the
`sentinel` logcontext. It's only when it calls into application code that a logcontext
gets activated. This means the reactor should be started in the `sentinel` logcontext,
and any time an awaitable yields control back to the reactor, it should reset the
logcontext to be the `sentinel` logcontext. This is important to avoid leaking the
current logcontext to the reactor (which would then get picked up and associated with
the next thing the reactor does).

Testing strategy

  1. Adjust your logging config to include %(server_name)s in the format
    formatters:
        precise:
            format: '%(asctime)s - %(server_name)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s - %(message)s'
  2. Start Synapse: poetry run synapse_homeserver --config-path homeserver.yaml
  3. Make some requests (curl http://localhost:8008/_matrix/client/versions, etc)
  4. Open the homeserver logs and notice the server_name in the logs as expected. unknown_server_from_sentinel_context is expected for the sentinel logcontext (things outside of Synapse).

Dev notes



Todo

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

This allows us to get access to `server_name` before
which we may want to use in the `with LoggingContext("main"):`
call early on.

This also allows us more flexibility to parse config however
we want and setup a Synapse homeserver. Like what we do
in Synapse Pro for Small Hosts.
There are no instances where we don't provide it either
@github-actions github-actions bot deployed to PR Documentation Preview August 25, 2025 22:47 Active
@MadLittleMods MadLittleMods changed the title Better expose server_name to logging Fix server_name in logging context for multiple Synapse instances in one process Aug 25, 2025
MadLittleMods added a commit that referenced this pull request Sep 10, 2025
We will need this in #18868
but not for now
MadLittleMods added a commit that referenced this pull request Sep 16, 2025
#18870)

Remove `sentinel` logcontext where we log in `setup`, `start`, and exit.

Instead of having one giant PR that removes all places we use `sentinel`
logcontext, I've decided to tackle this more piece-meal. This PR covers
the parts if you just startup Synapse and exit it with no requests or
activity going on in between.

Part of #18905 (Remove
`sentinel` logcontext where we log in Synapse)

Prerequisite for #18868.
Logging with the `sentinel` logcontext means we won't know which server
the log came from.



### Why


https://github.com/element-hq/synapse/blob/9cc400177822805e2a08d4d934daad6f3bc2a4df/docs/log_contexts.md#L71-L81

(docs updated in #18900)


### Testing strategy

1. Run Synapse normally and with `daemonize: true`: `poetry run
synapse_homeserver --config-path homeserver.yaml`
 1. Execute some requests
 1. Shutdown the server
 1. Look for any bad log entries in your homeserver logs:
    - `Expected logging context sentinel but found main`
    - `Expected logging context main was lost`
    - `Expected previous context`
    - `utime went backwards!`/`stime went backwards!`
- `Called stop on logcontext POST-0 without recording a start rusage`
 1. Look for any logs coming from the `sentinel` context


With these changes, you should only see the following logs (not from
Synapse) using the `sentinel` context if you start up Synapse and exit:

`homeserver.log`
```
2025-09-10 14:45:39,924 - asyncio - 64 - DEBUG - sentinel - Using selector: EpollSelector

2025-09-10 14:45:40,562 - twisted - 281 - INFO - sentinel - Received SIGINT, shutting down.

2025-09-10 14:45:40,562 - twisted - 281 - INFO - sentinel - (TCP Port 9322 Closed)
2025-09-10 14:45:40,563 - twisted - 281 - INFO - sentinel - (TCP Port 8008 Closed)
2025-09-10 14:45:40,563 - twisted - 281 - INFO - sentinel - (TCP Port 9093 Closed)
2025-09-10 14:45:40,564 - twisted - 281 - INFO - sentinel - Main loop terminated.
```
MadLittleMods added a commit that referenced this pull request Sep 17, 2025
MadLittleMods added a commit that referenced this pull request Sep 22, 2025
This allows us to get access to `server_name` so we can use it when
creating the `LoggingContext("main")` in the future (pre-requisite for
#18868).

This also allows us more flexibility to parse config however we want and
setup a Synapse homeserver. Like what we do in [Synapse Pro for Small
Hosts](https://github.com/element-hq/synapse-small-hosts).

Split out from #18868
Conflicts:
	synapse/_scripts/synapse_port_db.py
	synapse/app/admin_cmd.py
	synapse/app/appservice.py
	synapse/app/client_reader.py
	synapse/app/event_creator.py
	synapse/app/federation_reader.py
	synapse/app/federation_sender.py
	synapse/app/frontend_proxy.py
	synapse/app/generic_worker.py
	synapse/app/homeserver.py
	synapse/app/media_repository.py
	synapse/app/pusher.py
	synapse/app/synchrotron.py
	synapse/app/user_dir.py
	synapse/logging/context.py
	tests/handlers/test_federation.py
	tests/util/test_logcontext.py
(downstream hasn't been updated yet)
@github-actions github-actions bot deployed to PR Documentation Preview September 23, 2025 03:40 Active
@github-actions github-actions bot deployed to PR Documentation Preview September 24, 2025 21:57 Active
Comment on lines 591 to 592
def __init__(self, request: str = ""):
self._default_request = request
Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We previously recommended adding this to your logging config and became no longer necessary after matrix-org/synapse#8051 although was just redundant and harmless if you still did.

filters:
  context:
    (): synapse.logging.context.LoggingContextFilter
    request: ""

If we remove this __init__ constructor, we then start to see exceptions like #18868 (comment) so we should consider just reverting and putting it back in place for now.

Since it's been 5 years since we started adding LoggingContextFilter automatically for people, we could consider making a breaking change but it would be best to do this in a separate PR to track the decision easier.

I only tried to remove it because it seemed useless.

Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored LoggingContextFilter.__init__ constructor for now ⏩

I've also added all of the history context in a comment

@MadLittleMods MadLittleMods force-pushed the madlittlemods/per-tenant-logging-server-name branch from fabf85e to 44fa84f Compare September 25, 2025 22:23
@@ -0,0 +1 @@
Fix `server_name` in logging context for multiple Synapse instances in one process.
Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progressed outside of this PR: We have removed enough sentinel logcontext usage in Synapse for this to be useful (tracked by #18905).

With this PR, we will be able to distinguish which server sent the logs wherever we're not using sentinel logcontext. And if we find any more sentinel logcontext usage, we can be fix it up piecemeal like normal.

@MadLittleMods MadLittleMods marked this pull request as ready for review September 25, 2025 23:41
@MadLittleMods MadLittleMods requested a review from a team as a code owner September 25, 2025 23:41
Copy link
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looked like a scary PR numbers-wise but was actually pretty approachable since it was almost entirely just straightforward plumbing; a real gold star for structuring your PR sequence to make this possible, so thanks!

def __init__(self, request: str = ""):
def __init__(
self,
# `request` is here for backwards compatibility since we previously recommended
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice comment, thanks :)

clock = Clock(cast(ISynapseThreadlessReactor, reactor))
clock = Clock(
cast(ISynapseThreadlessReactor, reactor),
server_name="synapse_module_running_from_unknown_server",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to track which server this is running on? Such information would be useful if a particular module is misbehaving.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we could do a pattern like the following today because we expose ModuleApi.server_name(), I'd prefer not to make a further mess.

Modified DirectServeJsonResource
class DirectServeJsonResource(_AsyncResource):
    """A resource that will call `self._async_on_<METHOD>` on new requests,
    formatting responses and errors as JSON.
    """

    def __init__(
        self,
        canonical_json: bool = False,
        extract_context: bool = False,
        # Clock is optional as this class is exposed to the module API.
        clock: Optional[Clock] = None,
        # This is only necessary for Module API users who don't pass in a `clock`.
        server_name: Optional[str] = None,
    ):
        """
        Args:
            canonical_json: TODO
            extract_context: TODO
            clock: This is expected to be passed in by any Synapse code.
                Only optional for the Module API.
            server_name: The homeserver name (this should be `ModuleApi.server_name()`).
                Only used for the Module API if `clock` is not passed in.
        """

        if clock is None:
            clock = Clock(
                cast(ISynapseThreadlessReactor, reactor),
                server_name=server_name
                if server_name is not None
                else "synapse_module_running_from_unknown_server",
            )
        else:
            assert server_name is None, (
                "No need to pass in `server_name` if clock is set"
            )

        super().__init__(clock, extract_context)
        self.canonical_json = canonical_json

Usage:

class SomeResource(DirectServeJsonResource):
    def __init__(
        self,
        module_api: ModuleApi,
    ):
        super().__init__(server_name=module_api.server_name())


class SomeSynapseModule:
    """
    Synapse module that TODO
    """

    def __init__(
        self, config: MyModuleConfig, module_api: ModuleApi
    ):
        # Keep a reference to the config and Module API
        self._module_api = module_api

        # "Modules **must** register their web resources in their `__init__` method."
        # (https://github.com/element-hq/synapse/blob/081f6ad50fa0ea87c348778e8be40517da25c698/docs/modules/writing_a_module.md#L69)
        self._module_api.register_web_resource(
            SomeResource(
                self._module_api
            ),
        )

The ideal solution would be using the hs.get_clock() and passing in the Clock directly. See #18868 (comment) for why the homeserver clock is ideal. We currently don't expose Clock to module API consumers and maybe we shouldn't as it's best for them to continue using the dedicated ModuleApi equivalents.

Overall, this requires much more thought. And perhaps handing out less raw interfaces like DirectServeJsonResource in the module API as it makes doing the right thing like this practically impossible without making breaking changes.

MadLittleMods added a commit to matrix-org/sytest that referenced this pull request Sep 26, 2025
This hasn't been necessary for 5 years since
matrix-org/synapse#8051
because we automatically configure this for you within Synapse itself.

Spawning from seeing this fail after we tried to change the `LoggingContextFilter` constructor in element-hq/synapse#18868 (comment). Although this served as a decent canary of what people may have historically configured, I've now added the relevant context to that part of the code as part of element-hq/synapse#18868
Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanations. Agreed that this should be a lesson for future module API expansion.

This PR LGTM!

real_clock = Clock(cast(ISynapseThreadlessReactor, reactor))
real_clock = Clock(
cast(ISynapseThreadlessReactor, reactor), server_name=server_name
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the resolved conversation in the Linearizer section, wouldn't that apply here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you're talking about #18868 (comment)

Yes, ideally we'd use the homeserver Clock 👍

We already had access to server_name here and clock wasn't being passed in at all like the Linearizer had.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks 🙏

@MadLittleMods MadLittleMods merged commit 5143f93 into develop Sep 26, 2025
75 of 78 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/per-tenant-logging-server-name branch September 26, 2025 22:10
@MadLittleMods
Copy link
Contributor Author

Thanks for the review @reivilibre, @anoadragon453, and @jason-famedly 🐉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants