Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Protobuf Transitive Dependency #38

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

sfc-gh-jopel
Copy link
Collaborator

@sfc-gh-jopel sfc-gh-jopel commented Sep 25, 2024

Splitting PRs

Since this PR is very large, each component will be split into smaller PRs to individually be reviewed and merged individually

  • Code generator: Add code generator for otel proto #39
  • TODO: Internalizing opentelemetry-exporter-otlp-proto-common code
  • TODO: Modifying opentelemetry-exporter-otlp-proto-common to use generated code instead of protobuf
  • TODO: Using the custom encoding logic instead of opentelemetry-exporter-otlp-proto-common (but with a try..except import to use old method if the dependency is already loaded in the runtime)

Changes:

  • Removes opentelemetry-exporter-otlp-proto-common as a runtime dependency since it uses protobuf, and internalize code for converting SDK objects to proto objects
  • Implements custom protobuf serialization using pure python based on the opentelemetry-proto message definitions
  • Adds protoc plugin to generate python serialization objects based on opentelemetry-proto message definition (adds dev dependency of protobuf, Jinja2, grpcio-tools, black, isort)
  • Chose the fastest approach I found from benchmarks which is 1.2-1.3x faster serialization performance and over 4x less peak memory usage. This serialization approach is subject to change in the future.

Overview of files:

  • src/snowflake/telemetry/_internal/opentelemetry/proto/ is code generated by scripts/proto_codegen.sh to serialize the custom messages in opentelemetry-proto spec
  • compile/ is the custom protoc compiler plugin to generate the files above, used by the proto_codegen.sh script
  • snowflake/telemetry/_internal/opentelemetry/exporter/otlp/proto/common/ is copied from opentelemetry-exporter-otlp-proto-common and slightly modified to use the newly defined message types
  • snowflake/telemetry/_internal/serialize/ is the custom serialization logic for protobuf primitive types

TODO

  • TODO: Add "proxy" try/except import between opentelemetry.exporter.otlp.proto.common and snowflake.telemetry._internal.exporter.otlp.proto.common depending on if opentelemetry-exporter-otlp-proto-common==1.26.0 is available in the runtime. Will be done in a separate PR since it makes changes / tests harder to review
  • TODO: Release new version of snowflake-telemetry-python v0.6.0, will be done in separate PR

Testing:

  • Added opentelemetry-exporter-otlp-proto-common as a test dependency to test for serialization correctness
  • Ensures that serialization is follows same semantics as protobuf serialization
  • TODO: Add unit tests for proto plugin / compilation

Benchmarking:

  • Add benchmarks to compare protobuf serialization vs custom serialization performance, 1.2x faster
  • TODO: Add benchmarks for memory usage, 4x improvement
  • TODO: Add benchmarks for telemetry package loading time on program startup when imported
-----------------------------------------------------------------------------
Benchmark                                   Time             CPU   Iterations
-----------------------------------------------------------------------------
test_bm_serialize_logs_data             78590 ns        78590 ns         8893
test_bm_pb2_serialize_logs_data         96043 ns        96043 ns         7277
test_bm_serialize_metrics_data         132482 ns       132482 ns         5285
test_bm_pb2_serialize_metrics_data     163929 ns       163931 ns         4270
test_bm_serialize_traces_data          103523 ns       103524 ns         6656
test_bm_pb2_serialize_traces_data      132048 ns       132048 ns         5294

# We need to inspect the spans and group + structure them as:
#
# Resource
# Instrumentation Library
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instrumentation Library -> Scope

def write_to(self, out: "ProtoSerializer") -> None:
...

class ProtoSerializer:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One optimization that you can do is to pre-calculate the size of the binary array and pre-allocate the array.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried changing my serialize implementation to write to a right-sized buffer (using memoryview for zero-copy fast overwriting of slices). I noticed that it was about 10-20% slower than my current implementation (Can see my changes here https://github.com/sfc-gh-jopel/snowflake-telemetry-python/pull/1/files). This script illustrates it a little.

def test1():
    b = bytearray(len(b"hello world") * 10000)
    m = memoryview(b)
    i = 0
    l = len(b"hello world")
    for _ in range(10000):
        m[i:i+l] = b"hello world"
        i += l
    return b

def test2():
    b = bytearray()
    for _ in range(10000):
        b.extend(b"hello world"[::-1])
    return b[::-1]

def test3():
    b = bytearray()
    for _ in range(10000):
        b += b"hello world"[::-1]
    return b[::-1]

if __name__ == "__main__":
    import timeit
    print(timeit.timeit("test1()", setup="from __main__ import test1", number=1000))
    print(timeit.timeit("test2()", setup="from __main__ import test2", number=1000))
    print(timeit.timeit("test3()", setup="from __main__ import test3", number=1000))

Output:

1.3387652660021558
1.4347571920079645
1.276998276996892

I had the extend approach reverse the bytes before writing since that is what my current implementation is doing. The preallocated approach was only slightly faster than the extend approach, and slower than += method. (I should switch the extend to += for a slight performance increase). Calculating the size of the buffer would be more complex for our proto messages than what is done in this script, and likely make it the slowest of the 3.

This result is puzzling to me because the memoryview approach should be more memory friendly. My guess is for the messages I am using in my benchmark (and probably most telemetry data) fits in cache so the resize operation happens infrequently and is fast enough that it still outperforms the overhead of bounds checking in the right-sized approach.

I am not sure what is the best thing to do here. I think memoryview will outperform for huge logs and traces, but I don't think it's the typical use case.

@sfc-gh-jopel sfc-gh-jopel force-pushed the remove-protobuf-3 branch 3 times, most recently from 077f82b to fa8d31e Compare October 18, 2024 23:19
@sfc-gh-jopel sfc-gh-jopel marked this pull request as ready for review October 21, 2024 16:31
@sfc-gh-jopel sfc-gh-jopel requested a review from a team as a code owner October 21, 2024 16:31
@sfc-gh-jopel sfc-gh-jopel marked this pull request as draft October 22, 2024 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants