Skip to content

Spurious "failed to start writer service" warnings from ddtrace.internal.writer.writer under concurrent first-write #17871

@dgolombek

Description

@dgolombek

Summary

AgentWriter._write_with_client logs failed to start writer service at WARNING (and is exposed at ERROR by some agent log shippers) whenever two threads/async tasks race to perform the first write(). The warning is misleading: the writer is running by the time the message is emitted, so no traces are dropped. The noise is significant in any process that fans out concurrent instrumented I/O on a cold tracer.

Version

  • ddtrace==4.7.1 (warning is also present on main today, ddtrace/internal/writer/writer.py:914-916)
  • Python 3.13
  • Linux, EC2

Code path

ddtrace/internal/writer/writer.py (4.7.1, line 1119–1126):

if self._sync_mode is False:
    # Start the Writer on first write.
    try:
        if self.status != service.ServiceStatus.RUNNING:
            self.start()
    except service.ServiceStatusError:
        _safelog(log.warning, "failed to start writer service")

Service.start() (ddtrace/internal/service.py:48-60) acquires a lock and raises ServiceStatusError only when status is already RUNNING. The check at line 1122 is outside the lock, so this is a TOCTOU race: thread A sees STOPPED and calls start(), thread B sees STOPPED, calls start() after thread A has flipped status, gets ServiceStatusError, logs the warning. The writer is healthy.

Reproduction

Any process that issues many concurrent instrumented HTTP calls on first write reproduces this. Minimal:

import asyncio, httpx, ddtrace
ddtrace.patch_all()

async def main():
    async with httpx.AsyncClient() as c:
        await asyncio.gather(*(c.get("https://example.com") for _ in range(50)))

asyncio.run(main())

Observed impact

In our environment over a 7-day window, failed to start writer service was logged 671 times across five services. Each occurrence has a paired ERROR-status sibling from our log shipper. The pattern concentrates on services with concurrent fan-out.

The warning appears to be a known long-standing artifact:

What we'd like

A way to either (a) suppress this specific log without filtering all of ddtrace.internal.writer.writer, or (b) eliminate the race so the warning doesn't fire when traces are not actually lost. Happy to leave the design to maintainers — opening this primarily so other affected users have something to link.

Workaround

Setting _DD_INTERNAL_WRITER_WRITER_LOG_LEVEL=ERROR (the configurable logger levels added in #13562) suppresses the warning at source.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions