-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(eap): Add a GetTraces endpoint #6671
base: master
Are you sure you want to change the base?
Conversation
def _validate_order_by(in_msg: GetTracesRequest) -> None: | ||
order_by_cols = set([ob.column.name for ob in in_msg.order_by]) | ||
selected_columns = set([c.name for c in in_msg.columns]) | ||
if not order_by_cols.issubset(selected_columns): | ||
raise BadSnubaRPCRequestException( | ||
f"Ordered by columns {order_by_cols} not selected: {selected_columns}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention here to allow ordering by any of the selected columns? The 2 main ones we'd like to have will be by timestamp (newest/oldest trace) and by duration (shortest/longest trace) though this second one is harder to define clearly as the trace duration is as it's becoming closer to a session depending on the platform.
I'm not opposed to just supporting ordering by timestamp first and we can add more later once we more clearly define them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the intention. We can have a default sort to timestamp and you'll see later if you use custom sorts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
return GetTracesResponse(meta=response_meta) | ||
|
||
# Get metadata for those traces. | ||
traces = self._get_metadata_for_traces(request=in_msg, trace_ids=trace_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you first query for trace_ids, and then do another query for the attributes, rather than doing them together in 1 query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance and because we can't compute all attributes in one query. It's much faster to select a list of trace IDS and timestamps matching the conditions and then to compute the appropriate attributes on the list.
Right now, the UI will query all the attributes at once, and I can't compute FILTERED_ITEM_COUNT
and TOTAL_ITEM_COUNT
in one query since, in order to count the appropriate items matching some conditions, I can't pass them in the WHERE
clause, I need to use a countIf
without a WHERE
.
We could optimize later on by not doing 2 queries depending on the attributes that are requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nitpicks but looks good to me overall.
|
||
_ATTRIBUTES: dict[ | ||
TraceAttribute.Key.ValueType, | ||
tuple[str, AttributeKey.Type.ValueType], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: a named tuple/data class/typed dict would be a little cleaner here so you don't have to access it by index
❌ 1 Tests Failed:
View the top 1 failed tests by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
This is a possible implementation of the new FindTraces endpoint.
getsentry/sentry-protos#71
Closes https://github.com/getsentry/eap-planning/issues/122.