-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to specify custom rules to set event.outcome
#482
Comments
Our tracing specs, both for spans and transactions, already encourage (but not mandate) agents to expose such an API. The Java agent, for example, offers the
This is right, but since you can do that wherever you want in your code, most likely you have the relevant info (e.g. status code), right? |
Thanks @eyalkoren This and other use cases can be solved by allowing callback functions to be executed at different stages of a |
OK, I read it as if a public API for setting the outcome is a valid alternative that you may not be aware of. |
I just created #489 and then stumbled upon this which sounds like exactly the same. I was recently talking to @felixbarny, @axw and @Mpdreamz about the ability to customize the logic for setting My proposal is to have a config option for configuring this. This could be in the form of a regex that matches against |
@michaelhyatt several agents have the concept of filters or processors which sounds like that may solve the use case. Could you describe some of the use cases for that? Why is the built-in logic not good enough in some cases? What are some concrete custom rules you'd want to add? @sqren while closely related, the use case you're mentioning is a little different in that it's not about adding arbitrarily complex rules. Instead, it's about whether or not client-side errors should be added to the error rate calculation. A config for HTTP status codes is a bit too limiting, I think because it wouldn't work for protocols like gRPC, for example. But we could have a boolean flag that determines whether client-side errors should be considered a success or failure. |
I think this would be a great start. |
I guess my PoV is really rooted in seeing a lot of the implementations that show error rates as blanks despite having errors. So, I am thinking about fixing the way we capture |
The empty error rate chart could mean that you're using an old version of the agent that doesn't set the outcome yet. I assume these services are talking over gRPC? More generally, in some cases, it can be a valid situation that the agent captures errors but that the transaction error rate is 0. In other words, even if an error is captured within a transaction, the whole transaction may still be successful. If the server just logs an error and takes compensating actions, the transaction is not necessarily a failure. For example, if you're requesting a product page in an online shop and the call to the recommendation service fails, the page may still render fine with a 200. |
@felixbarny yes, some of them are gRPC and the agents can be a bit old. |
Assuming that updating the agents fix the issue, would you still need custom rules to set the outcome? Do you have specific use cases in mind? Do the processor/filter APIs and the set outcome APIs fulfill your needs? If not, why and what would we need to change in your view? |
Hi folks! @sqren dropped me a link to this because Dream Machine is also interested in having additional response codes be marked as failures, and he and Casper were so kind as to hop on a Zoom with me recently to watch me stumble around APM, teach me a few things, and collect feedback/observations, and this was something that came up. I think if I wanted to do this on my own, I'd be looking at writing up some new Express middleware function that I could find a way get inserted near the end of the request chain (after the work is done, but before APM sends the transaction data) so that I could examine the response and decide to manually override the event outcome. I could see Soren's regex concept being a great bandage to get some form of this capability out there, but I think if you can let the user define their own evaluation function that APM coordinates running at an appropriate timing during the request chain, then it gets really nice. I could see it working by optionally defining a The power of letting the user have a function instead of just defining it by codes, is that the developer can now consider the status code AND something else about the state of the response in deciding success or failure. That said, for now my use case feels strictly based on the code value, but I could imagine having a more complex decision tree at some point. |
Would you need the web framework's response object for that or are the properties that APM collects about the response enough?
For you current use case, how would you like the outcome to be mapped? Is it just that 4xx should count as an error for the server-side transaction (not just 5xx)? One of the reasons why we didn't map 4xx to I'm not trying to dismiss your use case, just trying to understand it better, explain the reasoning of the current state, and see if the existing APIs are already covering your use case 🙂 |
Is your feature request related to a problem? Please describe.
A transaction representing a service invocation success or failure can't always be defined by a return status. Sometimes, HTTP response code 2xx is not a sufficient indication of success and there is a need to explore additional attributes to be able to confidently set
event.outcome
correctly. The same applies for designated a transaction as a failure.Describe the solution you'd like
As a service developer, I want to be able to define more complex rules to set
event.outcome
to success or failure based on the additional information either available as transaction attributes, or through an API.Describe alternatives you've considered
Using public APIs is possible, but negates the value of autodetection completely.
Additional context
Without this ability, the error rate indication in the APM UI is not showing the correct picture of failures.
The text was updated successfully, but these errors were encountered: