Logs *can be* tech debt - traces *should* replace? #2
Replies: 1 comment
-
It's not a topic I've thought that deeply about, but here goes anyway... There are various different purposes to logs, and where it comes to diagnostics there's overlap into tracing. But like everything in IT there are trade offs happening here. One of the main causes of tickets in most estates is disks filling up with logs, because the logs are being badly managed. The people doing that are terrified that tracing will just fill those disks up quicker. Of course these days we shouldn't be worrying about individual disks on individual servers when we have infinite object storage in the sky, but infinite storage can cost $∞. Agreed that log formats have been a disaster since forever. Thoughtful people have invested countless hours coming up with sophisticated standard schemas for stuff, and idiots in a hurry have no comprehension of that work - that's almost the definition for a whole chunk of tech debt that's out there, with more being thrown down as I type. Having lived through the early days of Security Information and Event Management (SIEM) I've watched how logs can be abused, and how there's an entire cottage industry out there stitching a picture of what's happening in the world back together out of disparate logs from all over the place. It's a mess. Since we've now had open standards for messaging for some substantial time (MQTT, AMQP, NATS etc.) there are clearly better ways, and yet... we still see logs being used time and again as a poor person's messaging system. As we watch Colonial Pipeline finally focus some minds in Washington on issues of resilience (almost 4 years after NotPetya nearly wiped out four $MultiBn companies in an afternoon) I'm once again hopeful that there's going to be a push for improvement; but if I learned anything from last time it's 'don't hold your breath'. |
Beta Was this translation helpful? Give feedback.
-
Curious what you think about logs. I think they have tech debt trapped in them and I don't see a need for them in the world of distributed tracing -- and we should move to that world.
I've always thought CISOs should demand their developers and their vendors to emit traces from code that runs. Example: you see a log entry of "david logged in at 3pm successfully", you actually don't know all the function calls your SSO provider is making or the app that is using a third authentication mechanism, etc.
Other than failing services or reputational impact, where's the motivation to go back and improve the logs, or start getting more detail on how software is behaving so use cases beyond debugging are more empowered.
Beta Was this translation helpful? Give feedback.
All reactions