Introduce efficient codec for model events #4120

axw · 2020-08-31T08:36:12Z

Introduce an efficient binary codec for model events (at least transactions and spans), for persisting to local storage. This would be used in tail-based sampling, and perhaps in other processing such as #5936.

For the most efficient codec, we should use a statically defined encoding. Barring arguments made for alternatives, let's use protobuf.

In order to have a statically defined encoding we'll need our model to be stable. The events we produce are now substantially ECS-compliant (which is stable), with a few exceptions. For example, we emit transaction.duration.us or span.duration.us; ideally we would record only the ECS event.duration field (#5999).

I suggest we split this up into two phases:

Phase 1

Continue to maintain package model by hand, and continue to use this to emitting JSON documents
Introduce protobuf definitions, generate code in a separate package with protoc
Generate code to translate between handwritten model types and protoc-generated Go types

Phase 2

Use protobuf to generate JSON encoding code (Revise model transformation approach #3565)
Investigate using protoc-generated Go types everywhere, and removing handwritten model types

For phase 2 we'll need some more changes to remove any remaining logic in the model package, e.g. setting <event>.duration.us. This can be done in an ingest pipeline.

The text was updated successfully, but these errors were encountered:

axw · 2020-10-29T10:02:52Z

I spent a bit of time looking into this, with master...axw:model-protobuf being the fruits of my labour.

I wrote a generator which parses the model types and generates protobuf definitions, and then used plain old golang/protobuf to generate the code. I was going to use gogo/protobuf but had some probably-PEBKAC issues and ended up abandoned that for this short spike.

For generating protobuf correctly we would probably need to check them in and take the old definitions into account, so we don't accidentally remove fields and break compatibility with previous releases. Another option would be to have hand-written protobuf as the source of truth for our model types and generate the model types instead, using standard protoc tooling. That would be more typical and almost certainly less error prone. It might be painful to migrate though.

While testing I also realised that the model types we have do not have any encoding/json tags, and so we end up encoding all the empty fields. I added some json:",omitempty" tags to a bunch of fields to reduce this.

I compared this with the existing JSON codec, but bear in mind that the benchmarks are far from comprehensive. They currently only encode model.Transactions with the TraceID and ID fields set, and nothing else. With a more comprehensive set of fields, we might observe a different outcome. Nevertheless, here's what I found.

name                                      time/op
ReadEvents/json_codec/0_events-12            739ns ± 5%
ReadEvents/json_codec/1_events-12           7.59µs ± 7%
ReadEvents/json_codec/10_events-12          65.3µs ±21%
ReadEvents/json_codec/100_events-12          415µs ±24%
ReadEvents/json_codec/1000_events-12        7.20ms ±12%
ReadEvents/protobuf_codec/0_events-12       1.00µs ±24%
ReadEvents/protobuf_codec/1_events-12       5.15µs ± 3%
ReadEvents/protobuf_codec/10_events-12      35.0µs ±27%
ReadEvents/protobuf_codec/100_events-12      320µs ±20%
ReadEvents/protobuf_codec/1000_events-12    2.55ms ±32%
ReadEvents/nop_codec/0_events-12             833ns ±11%
ReadEvents/nop_codec/1_events-12            3.91µs ±12%
ReadEvents/nop_codec/10_events-12           24.8µs ±24%
ReadEvents/nop_codec/100_events-12           160µs ± 5%
ReadEvents/nop_codec/1000_events-12         1.52ms ±38%
WriteTransaction/json_codec-12              3.67µs ±10%
WriteTransaction/protobuf_codec-12           900ns ± 5%
WriteTransaction/nop_codec-12                377ns ± 7%

name                                      alloc/op
ReadEvents/json_codec/0_events-12             360B ± 0%
ReadEvents/json_codec/1_events-12           3.17kB ± 0%
ReadEvents/json_codec/10_events-12          24.2kB ± 0%
ReadEvents/json_codec/100_events-12          234kB ± 0%
ReadEvents/json_codec/1000_events-12        1.34MB ± 0%
ReadEvents/protobuf_codec/0_events-12         360B ± 0%
ReadEvents/protobuf_codec/1_events-12       2.88kB ± 0%
ReadEvents/protobuf_codec/10_events-12      21.2kB ± 0%
ReadEvents/protobuf_codec/100_events-12      205kB ± 0%
ReadEvents/protobuf_codec/1000_events-12    1.53MB ± 0%
ReadEvents/nop_codec/0_events-12              360B ± 0%
ReadEvents/nop_codec/1_events-12            2.30kB ± 0%
ReadEvents/nop_codec/10_events-12           15.5kB ± 0%
ReadEvents/nop_codec/100_events-12           147kB ± 0%
ReadEvents/nop_codec/1000_events-12         1.10MB ± 0%
WriteTransaction/json_codec-12                624B ± 0%
WriteTransaction/protobuf_codec-12            640B ± 0%
WriteTransaction/nop_codec-12                 256B ± 0%

name                                      allocs/op
ReadEvents/json_codec/0_events-12             8.00 ± 0%
ReadEvents/json_codec/1_events-12             27.0 ± 0%
ReadEvents/json_codec/10_events-12             136 ± 0%
ReadEvents/json_codec/100_events-12          1.22k ± 0%
ReadEvents/json_codec/1000_events-12         6.55k ± 0%
ReadEvents/protobuf_codec/0_events-12         8.00 ± 0%
ReadEvents/protobuf_codec/1_events-12         25.0 ± 0%
ReadEvents/protobuf_codec/10_events-12         116 ± 0%
ReadEvents/protobuf_codec/100_events-12      1.02k ± 0%
ReadEvents/protobuf_codec/1000_events-12     4.52k ± 0%
ReadEvents/nop_codec/0_events-12              8.00 ± 0%
ReadEvents/nop_codec/1_events-12              20.0 ± 0%
ReadEvents/nop_codec/10_events-12             66.0 ± 0%
ReadEvents/nop_codec/100_events-12             516 ± 0%
ReadEvents/nop_codec/1000_events-12          1.32k ± 0%
WriteTransaction/json_codec-12                6.00 ± 0%
WriteTransaction/protobuf_codec-12            6.00 ± 0%
WriteTransaction/nop_codec-12                 4.00 ± 0%

name                                      speed
ReadEvents/json_codec/0_events-12
ReadEvents/json_codec/1_events-12         44.2MB/s ± 8%
ReadEvents/json_codec/10_events-12        52.3MB/s ±24%
ReadEvents/json_codec/100_events-12       82.7MB/s ±28%
ReadEvents/json_codec/1000_events-12      46.9MB/s ±11%
ReadEvents/protobuf_codec/0_events-12
ReadEvents/protobuf_codec/1_events-12     14.8MB/s ± 3%
ReadEvents/protobuf_codec/10_events-12    22.3MB/s ±23%
ReadEvents/protobuf_codec/100_events-12   24.3MB/s ±22%
ReadEvents/protobuf_codec/1000_events-12  31.0MB/s ±30%
ReadEvents/nop_codec/0_events-12
ReadEvents/nop_codec/1_events-12
ReadEvents/nop_codec/10_events-12
ReadEvents/nop_codec/100_events-12
ReadEvents/nop_codec/1000_events-12
WriteTransaction/json_codec-12            84.9MB/s ±10%
WriteTransaction/protobuf_codec-12        57.8MB/s ± 5%
WriteTransaction/nop_codec-12

Overall the protobuf implementation is faster for both reading and writing, and encodes as fewer bytes (see the MB/s metrics). The protobuf implementation produces fewer allocations, but allocates more memory overall. I imagine this is due to the intermediate protobuf-generated types; we might be able to optimise here, I haven't looked into it.

I think this is worth pursuing, but we should probably do some real-world performance testing before embarking on an epic journey.

simitt · 2021-12-31T14:01:37Z

@axw can you update the issue with open pre-conditions?

axw · 2022-01-04T05:54:48Z

@simitt I've added some detail to the description. I think this is unblocked now.

axw · 2023-05-12T05:51:01Z

See also elastic/apm-data#36

kruskall · 2023-06-23T17:20:50Z

I think we covered everything on the list. Closing this.

axw added the enhancement label Aug 31, 2020

axw mentioned this issue Aug 31, 2020

sampling/eventstorage: introduce event storage #4108

Merged

2 tasks

axw added the [zube]: Backlog label Sep 2, 2020

axw mentioned this issue Sep 10, 2020

Feature: tail-based sampling #4185

Closed

8 tasks

axw mentioned this issue Nov 11, 2020

model: create a specification for valid event docs #4410

Open

axw added the v7.12.0 label Dec 24, 2020

axw added [zube]: Ready and removed [zube]: Backlog labels Feb 4, 2021

axw added v7.13.0 and removed v7.12.0 labels Feb 15, 2021

axw added v7.14.0 [zube]: Backlog and removed v7.13.0 [zube]: Ready labels Mar 16, 2021

axw added v7.15.0 and removed v7.14.0 labels Apr 23, 2021

axw mentioned this issue Jun 14, 2021

Support spooling to disk #5441

Closed

simitt mentioned this issue Jun 17, 2021

Tail based sampling (GA) #5486

Closed

zube bot added this to the 7.15 milestone Jul 1, 2021

zube bot added [zube]: Ready and removed [zube]: Backlog labels Jul 1, 2021

simitt changed the title ~~Introduce efficient codec for model events~~ [tail sampling] Introduce efficient codec for model events Jul 1, 2021

This was referenced Jul 6, 2021

model: introduce APMEvent; Batch is now []model.APMEvent #5613

Merged

Move source mapping to model processing #5631

Merged

simitt assigned axw Jul 7, 2021

zube bot unassigned axw Jul 7, 2021

zube bot added [zube]: In Progress and removed [zube]: Ready labels Jul 7, 2021

simitt assigned axw Jul 7, 2021

zube bot added [zube]: Blocked and removed [zube]: In Progress labels Oct 6, 2021

simitt added the blocked label Nov 10, 2021

axw mentioned this issue Nov 23, 2021

model: change Host.IP to an array #6694

Merged

2 tasks

simitt modified the milestones: 8.0, 8.1 Dec 1, 2021

axw removed the v8.0.0 label Dec 2, 2021

simitt mentioned this issue Dec 17, 2021

Tail Based Sampling GA #6894

Closed

21 tasks

simitt removed the [zube]: Blocked label Dec 17, 2021

simitt changed the title ~~[tail sampling] Introduce efficient codec for model events~~ Introduce efficient codec for model events Dec 17, 2021

axw removed the blocked label Jan 4, 2022

simitt added the 8.1-candidate label Jan 4, 2022

simitt removed this from the 8.1 milestone Jan 4, 2022

axw mentioned this issue Jan 12, 2022

The future of the go-structform library elastic/beats#29694

Closed

simitt removed the 8.1-candidate label Jan 25, 2022

simitt added the 8.3-candidate label Feb 8, 2022

axw mentioned this issue Feb 11, 2022

Set <event>.duration.us field in ingest pipeline #7261

Merged

2 tasks

stuartnelson3 mentioned this issue Mar 10, 2022

Revise model transformation approach #3565

Closed

simitt removed the 8.3-candidate label Jul 26, 2022

simitt added the 8.9-candidate label Apr 20, 2023

axw mentioned this issue May 12, 2023

Introduce protobuf definitions for model types elastic/apm-data#36

Closed

4 tasks

simitt added this to the 8.9 milestone May 15, 2023

simitt removed the 8.9-candidate label May 15, 2023

simitt unassigned axw May 15, 2023

simitt assigned kruskall Jun 8, 2023

kruskall closed this as completed Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce efficient codec for model events #4120

Introduce efficient codec for model events #4120

axw commented Aug 31, 2020 •

edited

Loading

axw commented Oct 29, 2020

simitt commented Dec 31, 2021

axw commented Jan 4, 2022

axw commented May 12, 2023

kruskall commented Jun 23, 2023

Introduce efficient codec for model events #4120

Introduce efficient codec for model events #4120

Comments

axw commented Aug 31, 2020 • edited Loading

axw commented Oct 29, 2020

simitt commented Dec 31, 2021

axw commented Jan 4, 2022

axw commented May 12, 2023

kruskall commented Jun 23, 2023

axw commented Aug 31, 2020 •

edited

Loading