-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Include causal status details in higher-level statuses #12409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
When an operation fails and we want to produce a new status at a higher level, we commonly are turning the first status into an exception to attach to the new exception. We should instead prefer to keep as much information in the status description itself, as cause is not as reliable to be logged/propagated. I do expect long-term we'll want to expose an API in grpc-api for this, but for the moment let's keep it internal. In particular, we'd have to figure out its name. I could also believe we might want different formatting, which becomes a clearer discussion when we can see the usages. I'm pretty certain there are some other places that could benefit from this utility, as I remember really wishing I had these functions a month or two ago. But these are the places I could immediately find. OutlierDetectionLoadBalancerConfig had its status code changed from INTERNAL to UNAVAILABLE because the value comes externally, and so isn't a gRPC bug or such. I didn't change the xds policies in the same way because it's murkier as the configuration for those is largely generated within xds itself.
.withCause(childConfig.getError().asRuntimeException())); | ||
return ConfigOrError.fromError(GrpcUtil.statusWithDetails( | ||
Status.Code.UNAVAILABLE, | ||
"Failed to parse child in outlier_detection_experimental", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that we're no longer including the rawConfig in the error description. I agree that this makes the error message much cleaner, but do you think we might be losing valuable context for debugging? Perhaps we could log the rawConfig at a FINE
level, or do you think the information while in this error is always sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per go/java-logging-best-practices#logging-levels-and-avoiding-spam
.withCause(childConfig.getError().asRuntimeException())); | ||
return ConfigOrError.fromError(GrpcUtil.statusWithDetails( | ||
Status.Code.INTERNAL, | ||
"Failed to parse child policy in wrr_locality LB policy", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above comment for rawConfig.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall except the concern on rawConfig.
When an operation fails and we want to produce a new status at a higher level, we commonly are turning the first status into an exception to attach to the new exception. We should instead prefer to keep as much information in the status description itself, as cause is not as reliable to be logged/propagated.
I do expect long-term we'll want to expose an API in grpc-api for this, but for the moment let's keep it internal. In particular, we'd have to figure out its name. I could also believe we might want different formatting, which becomes a clearer discussion when we can see the usages.
I'm pretty certain there are some other places that could benefit from this utility, as I remember really wishing I had these functions a month or two ago. But these are the places I could immediately find.
OutlierDetectionLoadBalancerConfig had its status code changed from INTERNAL to UNAVAILABLE because the value comes externally, and so isn't a gRPC bug or such. I didn't change the xds policies in the same way because it's murkier as the configuration for those is largely generated within xds itself.