Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add code generator for otel proto #39

Merged
merged 15 commits into from
Oct 29, 2024

Conversation

sfc-gh-jopel
Copy link
Collaborator

@sfc-gh-jopel sfc-gh-jopel commented Oct 22, 2024

This is the first of a few PRs to add custom marshaling logic in place of opentelemetry-exporter-otlp-proto-common. The objective is to remove the hard runtime dependency on protobuf from snowflake-telemetry-python. This first PR

  • Adds a protoc plugin to generate marshaling code for custom proto message types
  • Adds a script to compile opentelemetry-proto spec
  • Adds handwritten serialization code for protobuf primitive types used in opentelemetry-proto

More detailed explanation:
src/snowflake/telemetry/_internal/opentelemetry/proto/* is the code generated by running ./scripts/proto_codegen.sh. scripts/plugin.py is a plugin for the protoc compiler. It uses protoc to generate marshallng code for the custom message types defined in opentelemetry-proto. The responsibility of this generated code is to produce the same serialized bytestrings as pb2 generated structure would, given the same inputs. Additionally, handwritten util code located in src/snowflake/telemetry/_internal/serialize and is used to serialize primitive protobuf types, like varint, string, bytes, ints, etc.

@sfc-gh-jopel sfc-gh-jopel marked this pull request as ready for review October 22, 2024 21:00
@sfc-gh-jopel sfc-gh-jopel requested a review from a team as a code owner October 22, 2024 21:00
@sfc-gh-sili sfc-gh-sili changed the title Add code generator Add code generator for otel proto Oct 23, 2024
@sfc-gh-sili
Copy link
Collaborator

sfc-gh-sili commented Oct 23, 2024

Thanks for the work!
Can you add a section in the README that uses the proto generating script?

@@ -0,0 +1,237 @@
#!/usr/bin/env python3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit hard to understand without context. Can you add some explanation about this plug in and the standard protoc generator in the PR description? Is it simply a slimmed down version of protoc or does it do anything special?

message AnyValue {
oneof value {
string string_value = 1;
bool bool_value = 2;
Copy link
Collaborator

@sfc-gh-sili sfc-gh-sili Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easier if this is a seperate file called, say, tests/data/simple.proto

@@ -0,0 +1,237 @@
#!/usr/bin/env python3

import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make more sense to have this file (and template) under scripts since it is only used when we generate from proto.
Unless that stops tests from reading this file.

@sfc-gh-sili
Copy link
Collaborator

In general, I am convinced that this is a solid approach. We just want to make sure we have enough doc/comment to understand this a year from now.

Copy link
Collaborator

@sfc-gh-sili sfc-gh-sili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks

@sfc-gh-sili
Copy link
Collaborator

Keeping a note here for future reference: Before these gets into a new release, we need a test that validates this serialization against the official version. Psudo-code:

ourPyObj = snowflake.telemetry.SomeTelemetryObject()
ourSerialized = ourPyObj.serializeToString()
deserializedPyObj = protobuf.fromString(ourSerialized)

officialPyObj = protobuf.SomeTelemetryObject()

assert officialPyObj.serializeToString(deterministic=True) = deserializedPyObj.serializeToString(deterministic=True)
// alternatively, assert officialPyObj = deserializedPyObj if comparator is implemented

Copy link
Collaborator

@sfc-gh-tmonk sfc-gh-tmonk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I like the use of the custom plugin to generate the pure Python serialization code we need.

Could we create a Jenkins job that regularly tests the full build flow, now that it is getting more complex?

  • code generation
  • build the package (.tar.bz2 and .conda)
  • running tests against the built packages, including, like Sindy mentioned a comparison against serialized messages generated by official Python protobuf packages?

@sfc-gh-jopel sfc-gh-jopel merged commit 0fd842e into snowflakedb:main Oct 29, 2024
6 checks passed
sfc-gh-jopel added a commit that referenced this pull request Nov 5, 2024
sfc-gh-jopel added a commit to sfc-gh-jopel/snowflake-telemetry-python that referenced this pull request Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants