Skip to content

Performance comparison between VictoriaTraces and VictoriaLogs #46

@bailegebai

Description

@bailegebai

Is your feature request related to a problem? Please describe

Our application monitoring scenario involves using victoriaLogs (hereinafter referred to as vlog) to store trace data for the past two weeks. However, we found that querying by trace_id was slow. After learning that VictoriaTraces (hereinafter referred to as vtrace) had made improvements in this area (#594), we started using vtrace and compared the two.

  1. Data format (some sensitive data, such as IP address and app_name, have been masked):
    The data format of vlog is:
{
    "_time": "2025-08-25T02:58:19.042Z",
    "_stream_id": "0000000a00000000050b35a441085b72c690cf4305694343",
    "_stream": "{app_name=\"demo\",method_name=\"getRemind(java.lang.String,com.demo.fin.std.gold.schedule.domain.po.UserPriceRemind)\",service_name=\"com.demo.fin.std.gold.schedule.service.impl.RemindServiceImpl\",tenant=\"jdjr\"}",
    "_msg": "v6;demo;com.demo.fin.std.gold.schedule.service.impl.RemindServiceImpl;getRemind(java.lang.String,com.demo.fin.std.gold.schedule.domain.po.UserPriceRemind);127.0.0.1;consumer_gold_remind_partition_schedule_0_0_2_1756090670731;0;;;1756090699042;64;1857972;4490954132420954139;4490954132420954140;4490962172599732232;;;;;;injvm;gson:1:22;;1089;m6;;group=group-product-m6;jdjr;;;;",
    "app_name": "demo",
    "ip": "127.0.0.1",
    "line_tracing": "0",
    "method_name": "getRemind(java.lang.String,com.demo.fin.std.gold.schedule.domain.po.UserPriceRemind)",
    "protocol": "injvm",
    "sampling": "0",
    "service_name": "com.demo.fin.std.gold.schedule.service.impl.RemindServiceImpl",
    "success": "1",
    "tenant": "jdjr",
    "testing": "0",
    "duration": "1",
    "trace_id": "4490954132420954139"
}

The data format of vtrace is:

{
    "_time": "2025-08-24T20:07:10.009444528Z",
    "_stream_id": "0000000a000000001c3a3604e527859d9dccb13f6e64d312",
    "_stream": "{name=\"-\",resource_attr:service.name=\"demo\",resource_attr:service.namespace=\"jdjr\",span_attr:method=\"config(long,java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,javax.servlet.http.HttpServletResponse)\",span_attr:service=\"com.demo.jr.sgm.server.controller.AgentController\"}",
    "_msg": "-",
    "dropped_attributes_count": "0",
    "dropped_events_count": "0",
    "dropped_links_count": "0",
    "flags": "0",
    "kind": "0",
    "name": "-",
    "resource_attr:service.name": "demo",
    "resource_attr:service.namespace": "jdjr",
    "scope_name": "sgm-trace",
    "scope_version": "1.0.0",
    "span_attr:line_tracing": "0",
    "span_attr:local_port": "0",
    "span_attr:method": "config(long,java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,javax.servlet.http.HttpServletResponse)",
    "span_attr:protocol": "injvm",
    "span_attr:remote_port": "0",
    "span_attr:service": "com.demo.fin.std.gold.schedule.service.impl.RemindServiceImpl",
    "span_attr:testing": "0",
    "span_attr:timeout": "0",
    "span_attr:zone": "hc",
    "status_code": "1",
    "duration": "7444528",
    "end_time_unix_nano": "1756066030009444528",
    "parent_span_id": "3b62436021ea1801",
    "span_attr:child": "hikari:1:2",
    "span_attr:group": "server-hc",
    "span_attr:ip": "127.0.0.1",
    "span_attr:process": "16707",
    "span_attr:sampling": "0",
    "span_attr:thread": "http-nio-8080-exec-160",
    "span_id": "3b62436021ea1802",
    "start_time_unix_nano": "1756066030002000000",
    "trace_id": "3b62436021ea1801"
}
  1. The write performance comparison between the two is as follows:

Specifications of each component node:
vtrace:
insert 4c8g * 6
storage 26c56g10TB * 9
select 8c16g * 3

vlog:
insert 4c8g * 24
storage 26c56g10TB * 9
select 8c16g * 3

Data from 11:00 on Monday for two consecutive weeks
Image

insert Ingestion Rate
vtrace
11.72 MiB/s * 24 = 281 MiB/s
Image

vlog
74.96 MiB/s * 6 = 450 MiB/s
Image

insert CPU Useage
vtrace
4c * 27.2% * 24 = 26c
Image

vlog
4c * 27.47% * 6 = 6.6c
Image

insert Memory Usage
vtrace
8GB * 49.91% * 24 = 95.8GB
Image

vlog
8GB * 5.41% * 6 = 2.6GB
Image

storage Ingestion Rate
vtrace
63.82 MiB/s * 9 = 574 MiB/s
Image

vlog
69.28 MiB/s * 9 = 624 MiB/s
Image

storage CPU Usage
vtrace
26c * 29.96% * 9 = 70c
Image

vlog
26c * 31.31% * 9 = 73c
Image

storage Memory Usage
vtrace
56GB * 6.77% * 9 = 34GB
Image

vlog
56GB * 4.36% * 9 = 22GB
Image

Disk Space Usage(12 hours)
vtrace
(2.26TB - 2.17TB) * 9 = 829GB
Image

vlog
(1.05TB - 985GB) * 9 = 812GB
Image

The above comparison shows that, using vlog as a benchmark,
vtrace uses less disk space(73%) and more CPU resources(394%).

  1. Query Performance Comparison
    To improve query performance for trace_id scenarios, we deployed another vlog cluster. Based on the existing vlog data, we added an additional copy of each data entry indexed by trace_id (trace_id modulo 10000). The format is as follows:
Image

This solution stores an extra copy of trace_id index data, which significantly improves query performance but also significantly increases resource usage (approximately 60%).

Furthermore, in application monitoring scenarios, in addition to querying by trace_id, there are also scenarios where queries can be performed by other fields, such as user ID. These scenarios also exist in logging systems, such as querying all related logs by trace_id or user ID.

Describe the solution you'd like

Based on the above comparison and analysis, neither vlog nor vtrace components can fully cover our monitoring use cases. Is there a solution that can support the following requirements:

vtrace:

  1. Improve vtrace's write performance, making it as close to vlog's performance as possible.
  2. Improve the efficiency of vtrace's query by trace_id, and support adding indexes for fields other than trace_id.

vlog:

  1. Support adding indexes for specific fields in vlog.

Some of my thoughts:
To improve write performance,

  1. Reduce the amount of data written. Place only the data that needs to be queried and filtered by parameters in the label, and store the rest of the data in _msg as rows. This requires support for setting _msg. The general storage format is:
{
    "_msg": "custom_value1;custom_value2;custom_value3;custom_value4;custom_value5;custom_value6;custom_value7;custom_value8;custom_value9;custom_value10;",
    "_time": "2025-08-24T20:07:10.009444528Z",
    "_stream_id": "0000000a000000001c3a3604e527859d9dccb13f6e64d312",
    "_stream": "{name=\"-\",resource_attr:service.name=\"demo\",resource_attr:service.namespace=\"jdjr\",span_attr:method=\"config(long,java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,javax.servlet.http.HttpServletResponse)\",span_attr:service=\"com.demo.jr.sgm.server.controller.AgentController\"}",
    "name": "-",
    "span_attr:custom_label1": "custom_value1",
    "span_attr:custom_label2": "custom_value2",
    "span_attr:custom_label3": "custom_value3"

    ...

}
  1. We use the Loki API (/insert/loki/api/v1/push) to write data to the vlog. Is it possible to support this data ingestion HTTP API in vtrace to improve vtrace's write performance?

Regarding query performance,

  1. Can vtrace improve query performance by increasing the PartitionCount of the trace_id, or can the user set the PartitionCount themselves?

Describe alternatives you've considered

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions