Skip to content

Bad performance when query with trace_id only #21

@jiekun

Description

@jiekun

Is your feature request related to a problem? Please describe

VictoriaTraces organizes data by streams. When querying by trace_id only, it needs to:

  1. Query within ALL streams.
  2. For each stream, query all the data from new to old by step (let's say step=3h)
  3. Break when it finds the first result.

For the worst case (query with an "oldest" trace_id), VictoriaTraces needs to scan all the data (full time range * all streams), hence poor performance.

Describe the solution you'd like

PoC of the following:

  • create a helper stream trace_id_idx when ingesting data.
    • this stream contains 2 (or 3) fields: trace_id and time (or start_time and end_time).
    • insert row to this helper stream when ingesting the root span (whose parent_span_id is empty).
  • query helper stream for the time (or time range) and use it as the filter when querying by trace_id only.

Describe alternatives you've considered

No response

Additional information

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions