-
Notifications
You must be signed in to change notification settings - Fork 11
Description
In VictoriaTraces's cluster mode, the data distribution scheme is similar to VictoriaLogs: writes will be storage in one of vtstorage nodes(by randomly), while queries should query all vtstorage nodes. If one vtstorage node fails, vtselect can not serve reads, cuz it may return incorrect results if it do. And, vtinsert can still serve writes because there are still surviving vtstorage nodes. In other words, VictoriaTraces's cluster mode lacks HA read, if you really need it, you should use vmagent for cross-cluster HA, see https://docs.victoriametrics.com/victoriatraces/cluster/#high-availability
However, VT and VL differ in their data models. Logs can not be easily deduplicated (We can not easily assume that two logs with the same content should be deduplicated.), so replicating logs across vlstorage nodes for HA may be impractical. However, VT can deduplicate data by SpanID and TimeStamp(or something else), which means that, it can replicate data across multiple vtstorage nodes and deduplicate during queries, achieving cluster-level HA, just like VictoriaMetrics dose.
I am not sure whether current implementation of cluster is the final solution. If not, personally, I think it may be a good idea to implement HA in VT cluster just like VM.
The distribution optimization should include three tasks:
- Consistent-hash to distribute traces
- vtselect supports deduplication
- vtinsert replicates traces into N vtstorage