Skip to content

Conversation

ejona86
Copy link
Member

@ejona86 ejona86 commented Oct 7, 2025

When an operation fails and we want to produce a new status at a higher level, we commonly are turning the first status into an exception to attach to the new exception. We should instead prefer to keep as much information in the status description itself, as cause is not as reliable to be logged/propagated.

I do expect long-term we'll want to expose an API in grpc-api for this, but for the moment let's keep it internal. In particular, we'd have to figure out its name. I could also believe we might want different formatting, which becomes a clearer discussion when we can see the usages.

I'm pretty certain there are some other places that could benefit from this utility, as I remember really wishing I had these functions a month or two ago. But these are the places I could immediately find.

OutlierDetectionLoadBalancerConfig had its status code changed from INTERNAL to UNAVAILABLE because the value comes externally, and so isn't a gRPC bug or such. I didn't change the xds policies in the same way because it's murkier as the configuration for those is largely generated within xds itself.

When an operation fails and we want to produce a new status at a higher
level, we commonly are turning the first status into an exception to
attach to the new exception. We should instead prefer to keep as much
information in the status description itself, as cause is not as
reliable to be logged/propagated.

I do expect long-term we'll want to expose an API in grpc-api for this,
but for the moment let's keep it internal. In particular, we'd have to
figure out its name. I could also believe we might want different
formatting, which becomes a clearer discussion when we can see the
usages.

I'm pretty certain there are some other places that could benefit from
this utility, as I remember really wishing I had these functions a month
or two ago. But these are the places I could immediately find.

OutlierDetectionLoadBalancerConfig had its status code changed from
INTERNAL to UNAVAILABLE because the value comes externally, and so isn't
a gRPC bug or such. I didn't change the xds policies in the same way
because it's murkier as the configuration for those is largely generated
within xds itself.
.withCause(childConfig.getError().asRuntimeException()));
return ConfigOrError.fromError(GrpcUtil.statusWithDetails(
Status.Code.UNAVAILABLE,
"Failed to parse child in outlier_detection_experimental",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that we're no longer including the rawConfig in the error description. I agree that this makes the error message much cleaner, but do you think we might be losing valuable context for debugging? Perhaps we could log the rawConfig at a FINE level, or do you think the information while in this error is always sufficient?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per go/java-logging-best-practices#logging-levels-and-avoiding-spam

.withCause(childConfig.getError().asRuntimeException()));
return ConfigOrError.fromError(GrpcUtil.statusWithDetails(
Status.Code.INTERNAL,
"Failed to parse child policy in wrr_locality LB policy",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment for rawConfig.

Copy link
Member

@shivaspeaks shivaspeaks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall except the concern on rawConfig.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants