Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce efficient codec for model events #4120

Closed
axw opened this issue Aug 31, 2020 · 5 comments
Closed

Introduce efficient codec for model events #4120

axw opened this issue Aug 31, 2020 · 5 comments
Assignees
Milestone

Comments

@axw
Copy link
Member

axw commented Aug 31, 2020

Introduce an efficient binary codec for model events (at least transactions and spans), for persisting to local storage. This would be used in tail-based sampling, and perhaps in other processing such as #5936.

For the most efficient codec, we should use a statically defined encoding. Barring arguments made for alternatives, let's use protobuf.

In order to have a statically defined encoding we'll need our model to be stable. The events we produce are now substantially ECS-compliant (which is stable), with a few exceptions. For example, we emit transaction.duration.us or span.duration.us; ideally we would record only the ECS event.duration field (#5999).

I suggest we split this up into two phases:

Phase 1

  • Continue to maintain package model by hand, and continue to use this to emitting JSON documents
  • Introduce protobuf definitions, generate code in a separate package with protoc
  • Generate code to translate between handwritten model types and protoc-generated Go types

Phase 2

For phase 2 we'll need some more changes to remove any remaining logic in the model package, e.g. setting <event>.duration.us. This can be done in an ingest pipeline.

@axw
Copy link
Member Author

axw commented Oct 29, 2020

I spent a bit of time looking into this, with master...axw:model-protobuf being the fruits of my labour.

I wrote a generator which parses the model types and generates protobuf definitions, and then used plain old golang/protobuf to generate the code. I was going to use gogo/protobuf but had some probably-PEBKAC issues and ended up abandoned that for this short spike.

For generating protobuf correctly we would probably need to check them in and take the old definitions into account, so we don't accidentally remove fields and break compatibility with previous releases. Another option would be to have hand-written protobuf as the source of truth for our model types and generate the model types instead, using standard protoc tooling. That would be more typical and almost certainly less error prone. It might be painful to migrate though.

While testing I also realised that the model types we have do not have any encoding/json tags, and so we end up encoding all the empty fields. I added some json:",omitempty" tags to a bunch of fields to reduce this.

I compared this with the existing JSON codec, but bear in mind that the benchmarks are far from comprehensive. They currently only encode model.Transactions with the TraceID and ID fields set, and nothing else. With a more comprehensive set of fields, we might observe a different outcome. Nevertheless, here's what I found.

name                                      time/op
ReadEvents/json_codec/0_events-12            739ns ± 5%
ReadEvents/json_codec/1_events-12           7.59µs ± 7%
ReadEvents/json_codec/10_events-12          65.3µs ±21%
ReadEvents/json_codec/100_events-12          415µs ±24%
ReadEvents/json_codec/1000_events-12        7.20ms ±12%
ReadEvents/protobuf_codec/0_events-12       1.00µs ±24%
ReadEvents/protobuf_codec/1_events-12       5.15µs ± 3%
ReadEvents/protobuf_codec/10_events-12      35.0µs ±27%
ReadEvents/protobuf_codec/100_events-12      320µs ±20%
ReadEvents/protobuf_codec/1000_events-12    2.55ms ±32%
ReadEvents/nop_codec/0_events-12             833ns ±11%
ReadEvents/nop_codec/1_events-12            3.91µs ±12%
ReadEvents/nop_codec/10_events-12           24.8µs ±24%
ReadEvents/nop_codec/100_events-12           160µs ± 5%
ReadEvents/nop_codec/1000_events-12         1.52ms ±38%
WriteTransaction/json_codec-12              3.67µs ±10%
WriteTransaction/protobuf_codec-12           900ns ± 5%
WriteTransaction/nop_codec-12                377ns ± 7%

name                                      alloc/op
ReadEvents/json_codec/0_events-12             360B ± 0%
ReadEvents/json_codec/1_events-12           3.17kB ± 0%
ReadEvents/json_codec/10_events-12          24.2kB ± 0%
ReadEvents/json_codec/100_events-12          234kB ± 0%
ReadEvents/json_codec/1000_events-12        1.34MB ± 0%
ReadEvents/protobuf_codec/0_events-12         360B ± 0%
ReadEvents/protobuf_codec/1_events-12       2.88kB ± 0%
ReadEvents/protobuf_codec/10_events-12      21.2kB ± 0%
ReadEvents/protobuf_codec/100_events-12      205kB ± 0%
ReadEvents/protobuf_codec/1000_events-12    1.53MB ± 0%
ReadEvents/nop_codec/0_events-12              360B ± 0%
ReadEvents/nop_codec/1_events-12            2.30kB ± 0%
ReadEvents/nop_codec/10_events-12           15.5kB ± 0%
ReadEvents/nop_codec/100_events-12           147kB ± 0%
ReadEvents/nop_codec/1000_events-12         1.10MB ± 0%
WriteTransaction/json_codec-12                624B ± 0%
WriteTransaction/protobuf_codec-12            640B ± 0%
WriteTransaction/nop_codec-12                 256B ± 0%

name                                      allocs/op
ReadEvents/json_codec/0_events-12             8.00 ± 0%
ReadEvents/json_codec/1_events-12             27.0 ± 0%
ReadEvents/json_codec/10_events-12             136 ± 0%
ReadEvents/json_codec/100_events-12          1.22k ± 0%
ReadEvents/json_codec/1000_events-12         6.55k ± 0%
ReadEvents/protobuf_codec/0_events-12         8.00 ± 0%
ReadEvents/protobuf_codec/1_events-12         25.0 ± 0%
ReadEvents/protobuf_codec/10_events-12         116 ± 0%
ReadEvents/protobuf_codec/100_events-12      1.02k ± 0%
ReadEvents/protobuf_codec/1000_events-12     4.52k ± 0%
ReadEvents/nop_codec/0_events-12              8.00 ± 0%
ReadEvents/nop_codec/1_events-12              20.0 ± 0%
ReadEvents/nop_codec/10_events-12             66.0 ± 0%
ReadEvents/nop_codec/100_events-12             516 ± 0%
ReadEvents/nop_codec/1000_events-12          1.32k ± 0%
WriteTransaction/json_codec-12                6.00 ± 0%
WriteTransaction/protobuf_codec-12            6.00 ± 0%
WriteTransaction/nop_codec-12                 4.00 ± 0%

name                                      speed
ReadEvents/json_codec/0_events-12
ReadEvents/json_codec/1_events-12         44.2MB/s ± 8%
ReadEvents/json_codec/10_events-12        52.3MB/s ±24%
ReadEvents/json_codec/100_events-12       82.7MB/s ±28%
ReadEvents/json_codec/1000_events-12      46.9MB/s ±11%
ReadEvents/protobuf_codec/0_events-12
ReadEvents/protobuf_codec/1_events-12     14.8MB/s ± 3%
ReadEvents/protobuf_codec/10_events-12    22.3MB/s ±23%
ReadEvents/protobuf_codec/100_events-12   24.3MB/s ±22%
ReadEvents/protobuf_codec/1000_events-12  31.0MB/s ±30%
ReadEvents/nop_codec/0_events-12
ReadEvents/nop_codec/1_events-12
ReadEvents/nop_codec/10_events-12
ReadEvents/nop_codec/100_events-12
ReadEvents/nop_codec/1000_events-12
WriteTransaction/json_codec-12            84.9MB/s ±10%
WriteTransaction/protobuf_codec-12        57.8MB/s ± 5%
WriteTransaction/nop_codec-12

Overall the protobuf implementation is faster for both reading and writing, and encodes as fewer bytes (see the MB/s metrics). The protobuf implementation produces fewer allocations, but allocates more memory overall. I imagine this is due to the intermediate protobuf-generated types; we might be able to optimise here, I haven't looked into it.

I think this is worth pursuing, but we should probably do some real-world performance testing before embarking on an epic journey.

@axw axw added the v7.12.0 label Dec 24, 2020
@axw axw added v7.13.0 and removed v7.12.0 labels Feb 15, 2021
@axw axw added v7.15.0 and removed v7.14.0 labels Apr 23, 2021
@zube zube bot added this to the 7.15 milestone Jul 1, 2021
@simitt simitt changed the title Introduce efficient codec for model events [tail sampling] Introduce efficient codec for model events Jul 1, 2021
@zube zube bot unassigned axw Jul 7, 2021
@simitt simitt added the blocked label Nov 10, 2021
@simitt simitt modified the milestones: 8.0, 8.1 Dec 1, 2021
@axw axw removed the v8.0.0 label Dec 2, 2021
@simitt simitt mentioned this issue Dec 17, 2021
21 tasks
@simitt simitt changed the title [tail sampling] Introduce efficient codec for model events Introduce efficient codec for model events Dec 17, 2021
@simitt
Copy link
Contributor

simitt commented Dec 31, 2021

@axw can you update the issue with open pre-conditions?

@axw axw removed the blocked label Jan 4, 2022
@axw
Copy link
Member Author

axw commented Jan 4, 2022

@simitt I've added some detail to the description. I think this is unblocked now.

@axw
Copy link
Member Author

axw commented May 12, 2023

See also elastic/apm-data#36

@simitt simitt added this to the 8.9 milestone May 15, 2023
@simitt simitt unassigned axw May 15, 2023
@kruskall
Copy link
Member

I think we covered everything on the list. Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants