Summary
AgentWriter._write_with_client logs failed to start writer service at WARNING (and is exposed at ERROR by some agent log shippers) whenever two threads/async tasks race to perform the first write(). The warning is misleading: the writer is running by the time the message is emitted, so no traces are dropped. The noise is significant in any process that fans out concurrent instrumented I/O on a cold tracer.
Version
ddtrace==4.7.1 (warning is also present on main today, ddtrace/internal/writer/writer.py:914-916)
- Python 3.13
- Linux, EC2
Code path
ddtrace/internal/writer/writer.py (4.7.1, line 1119–1126):
if self._sync_mode is False:
# Start the Writer on first write.
try:
if self.status != service.ServiceStatus.RUNNING:
self.start()
except service.ServiceStatusError:
_safelog(log.warning, "failed to start writer service")
Service.start() (ddtrace/internal/service.py:48-60) acquires a lock and raises ServiceStatusError only when status is already RUNNING. The check at line 1122 is outside the lock, so this is a TOCTOU race: thread A sees STOPPED and calls start(), thread B sees STOPPED, calls start() after thread A has flipped status, gets ServiceStatusError, logs the warning. The writer is healthy.
Reproduction
Any process that issues many concurrent instrumented HTTP calls on first write reproduces this. Minimal:
import asyncio, httpx, ddtrace
ddtrace.patch_all()
async def main():
async with httpx.AsyncClient() as c:
await asyncio.gather(*(c.get("https://example.com") for _ in range(50)))
asyncio.run(main())
Observed impact
In our environment over a 7-day window, failed to start writer service was logged 671 times across five services. Each occurrence has a paired ERROR-status sibling from our log shipper. The pattern concentrates on services with concurrent fan-out.
The warning appears to be a known long-standing artifact:
What we'd like
A way to either (a) suppress this specific log without filtering all of ddtrace.internal.writer.writer, or (b) eliminate the race so the warning doesn't fire when traces are not actually lost. Happy to leave the design to maintainers — opening this primarily so other affected users have something to link.
Workaround
Setting _DD_INTERNAL_WRITER_WRITER_LOG_LEVEL=ERROR (the configurable logger levels added in #13562) suppresses the warning at source.
Summary
AgentWriter._write_with_clientlogsfailed to start writer serviceat WARNING (and is exposed at ERROR by some agent log shippers) whenever two threads/async tasks race to perform the firstwrite(). The warning is misleading: the writer is running by the time the message is emitted, so no traces are dropped. The noise is significant in any process that fans out concurrent instrumented I/O on a cold tracer.Version
ddtrace==4.7.1(warning is also present onmaintoday,ddtrace/internal/writer/writer.py:914-916)Code path
ddtrace/internal/writer/writer.py(4.7.1, line 1119–1126):Service.start()(ddtrace/internal/service.py:48-60) acquires a lock and raisesServiceStatusErroronly when status is alreadyRUNNING. The check at line 1122 is outside the lock, so this is a TOCTOU race: thread A seesSTOPPEDand callsstart(), thread B seesSTOPPED, callsstart()after thread A has flipped status, getsServiceStatusError, logs the warning. The writer is healthy.Reproduction
Any process that issues many concurrent instrumented HTTP calls on first write reproduces this. Minimal:
Observed impact
In our environment over a 7-day window,
failed to start writer servicewas logged 671 times across five services. Each occurrence has a paired ERROR-status sibling from our log shipper. The pattern concentrates on services with concurrent fan-out.The warning appears to be a known long-standing artifact:
ServiceStatusErrorinstances showing up in CPU profiles — performance, not correctness.What we'd like
A way to either (a) suppress this specific log without filtering all of
ddtrace.internal.writer.writer, or (b) eliminate the race so the warning doesn't fire when traces are not actually lost. Happy to leave the design to maintainers — opening this primarily so other affected users have something to link.Workaround
Setting
_DD_INTERNAL_WRITER_WRITER_LOG_LEVEL=ERROR(the configurable logger levels added in #13562) suppresses the warning at source.