-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Is your feature request related to a problem? Please describe
Our application monitoring scenario involves using victoriaLogs (hereinafter referred to as vlog) to store trace data for the past two weeks. However, we found that querying by trace_id was slow. After learning that VictoriaTraces (hereinafter referred to as vtrace) had made improvements in this area (#594), we started using vtrace and compared the two.
- Data format (some sensitive data, such as IP address and app_name, have been masked):
The data format of vlog is:
{
"_time": "2025-08-25T02:58:19.042Z",
"_stream_id": "0000000a00000000050b35a441085b72c690cf4305694343",
"_stream": "{app_name=\"demo\",method_name=\"getRemind(java.lang.String,com.demo.fin.std.gold.schedule.domain.po.UserPriceRemind)\",service_name=\"com.demo.fin.std.gold.schedule.service.impl.RemindServiceImpl\",tenant=\"jdjr\"}",
"_msg": "v6;demo;com.demo.fin.std.gold.schedule.service.impl.RemindServiceImpl;getRemind(java.lang.String,com.demo.fin.std.gold.schedule.domain.po.UserPriceRemind);127.0.0.1;consumer_gold_remind_partition_schedule_0_0_2_1756090670731;0;;;1756090699042;64;1857972;4490954132420954139;4490954132420954140;4490962172599732232;;;;;;injvm;gson:1:22;;1089;m6;;group=group-product-m6;jdjr;;;;",
"app_name": "demo",
"ip": "127.0.0.1",
"line_tracing": "0",
"method_name": "getRemind(java.lang.String,com.demo.fin.std.gold.schedule.domain.po.UserPriceRemind)",
"protocol": "injvm",
"sampling": "0",
"service_name": "com.demo.fin.std.gold.schedule.service.impl.RemindServiceImpl",
"success": "1",
"tenant": "jdjr",
"testing": "0",
"duration": "1",
"trace_id": "4490954132420954139"
}
The data format of vtrace is:
{
"_time": "2025-08-24T20:07:10.009444528Z",
"_stream_id": "0000000a000000001c3a3604e527859d9dccb13f6e64d312",
"_stream": "{name=\"-\",resource_attr:service.name=\"demo\",resource_attr:service.namespace=\"jdjr\",span_attr:method=\"config(long,java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,javax.servlet.http.HttpServletResponse)\",span_attr:service=\"com.demo.jr.sgm.server.controller.AgentController\"}",
"_msg": "-",
"dropped_attributes_count": "0",
"dropped_events_count": "0",
"dropped_links_count": "0",
"flags": "0",
"kind": "0",
"name": "-",
"resource_attr:service.name": "demo",
"resource_attr:service.namespace": "jdjr",
"scope_name": "sgm-trace",
"scope_version": "1.0.0",
"span_attr:line_tracing": "0",
"span_attr:local_port": "0",
"span_attr:method": "config(long,java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,javax.servlet.http.HttpServletResponse)",
"span_attr:protocol": "injvm",
"span_attr:remote_port": "0",
"span_attr:service": "com.demo.fin.std.gold.schedule.service.impl.RemindServiceImpl",
"span_attr:testing": "0",
"span_attr:timeout": "0",
"span_attr:zone": "hc",
"status_code": "1",
"duration": "7444528",
"end_time_unix_nano": "1756066030009444528",
"parent_span_id": "3b62436021ea1801",
"span_attr:child": "hikari:1:2",
"span_attr:group": "server-hc",
"span_attr:ip": "127.0.0.1",
"span_attr:process": "16707",
"span_attr:sampling": "0",
"span_attr:thread": "http-nio-8080-exec-160",
"span_id": "3b62436021ea1802",
"start_time_unix_nano": "1756066030002000000",
"trace_id": "3b62436021ea1801"
}
- The write performance comparison between the two is as follows:
Specifications of each component node:
vtrace:
insert 4c8g * 6
storage 26c56g10TB * 9
select 8c16g * 3
vlog:
insert 4c8g * 24
storage 26c56g10TB * 9
select 8c16g * 3
Data from 11:00 on Monday for two consecutive weeks
insert Ingestion Rate
vtrace
11.72 MiB/s * 24 = 281 MiB/s
vlog
74.96 MiB/s * 6 = 450 MiB/s
insert CPU Useage
vtrace
4c * 27.2% * 24 = 26c
insert Memory Usage
vtrace
8GB * 49.91% * 24 = 95.8GB
storage Ingestion Rate
vtrace
63.82 MiB/s * 9 = 574 MiB/s
vlog
69.28 MiB/s * 9 = 624 MiB/s
storage CPU Usage
vtrace
26c * 29.96% * 9 = 70c
storage Memory Usage
vtrace
56GB * 6.77% * 9 = 34GB
Disk Space Usage(12 hours)
vtrace
(2.26TB - 2.17TB) * 9 = 829GB
vlog
(1.05TB - 985GB) * 9 = 812GB
The above comparison shows that, using vlog as a benchmark,
vtrace uses less disk space(73%) and more CPU resources(394%).
- Query Performance Comparison
To improve query performance for trace_id scenarios, we deployed another vlog cluster. Based on the existing vlog data, we added an additional copy of each data entry indexed by trace_id (trace_id modulo 10000). The format is as follows:

This solution stores an extra copy of trace_id index data, which significantly improves query performance but also significantly increases resource usage (approximately 60%).
Furthermore, in application monitoring scenarios, in addition to querying by trace_id, there are also scenarios where queries can be performed by other fields, such as user ID. These scenarios also exist in logging systems, such as querying all related logs by trace_id or user ID.
Describe the solution you'd like
Based on the above comparison and analysis, neither vlog nor vtrace components can fully cover our monitoring use cases. Is there a solution that can support the following requirements:
vtrace:
- Improve vtrace's write performance, making it as close to vlog's performance as possible.
- Improve the efficiency of vtrace's query by trace_id, and support adding indexes for fields other than trace_id.
vlog:
- Support adding indexes for specific fields in vlog.
Some of my thoughts:
To improve write performance,
- Reduce the amount of data written. Place only the data that needs to be queried and filtered by parameters in the label, and store the rest of the data in _msg as rows. This requires support for setting _msg. The general storage format is:
{
"_msg": "custom_value1;custom_value2;custom_value3;custom_value4;custom_value5;custom_value6;custom_value7;custom_value8;custom_value9;custom_value10;",
"_time": "2025-08-24T20:07:10.009444528Z",
"_stream_id": "0000000a000000001c3a3604e527859d9dccb13f6e64d312",
"_stream": "{name=\"-\",resource_attr:service.name=\"demo\",resource_attr:service.namespace=\"jdjr\",span_attr:method=\"config(long,java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,javax.servlet.http.HttpServletResponse)\",span_attr:service=\"com.demo.jr.sgm.server.controller.AgentController\"}",
"name": "-",
"span_attr:custom_label1": "custom_value1",
"span_attr:custom_label2": "custom_value2",
"span_attr:custom_label3": "custom_value3"
...
}
- We use the Loki API (/insert/loki/api/v1/push) to write data to the vlog. Is it possible to support this data ingestion HTTP API in vtrace to improve vtrace's write performance?
Regarding query performance,
- Can vtrace improve query performance by increasing the PartitionCount of the trace_id, or can the user set the PartitionCount themselves?
Describe alternatives you've considered
No response
Additional information
No response