-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(satellite,electric): Implement distributed tracing using OpenTelemetry in the client #1371
base: main
Are you sure you want to change the base?
Conversation
6b8a321
to
6076eaa
Compare
c1ca87b
to
91ce330
Compare
8d9f744
to
ccddd36
Compare
91ce330
to
8a74517
Compare
990e86f
to
e2ecd84
Compare
These attributes are set on a "resource" which in our case is a single CI job within a workflow.
This allows us to filter traces by test. Unfortunately, LUX does not have a built-in way to get the current file name, so we have to do some manual work for this.
Thanks, Docker Compose! See docker/cli#4958 (comment)
014e5bd
to
5a05ee9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like the additional complexity added by the telemetry calls.
For example, the shape sync
method which used to be 5 LoC is now >30 LoC.
Also, we know that we also want to add some improved error logging, which might add yet other code unrelated to the core logic.
Looks like the ideal use case for some aspected-oriented programming but haven't seen good AoP solutions for JS. Essentially, i want to move the telemetry logic outside of the core logic here. Perhaps the chain-of-responsibility pattern could solve this problem but we would have to see if and how that fits our implementation. If we're fine with only tracing public API methods, we can perhaps go for the chain-of-responsibility pattern.
@@ -88,6 +88,7 @@ jobs: | |||
ELECTRIC_IMAGE_NAME: electric-sql-ci/electric | |||
ELECTRIC_CLIENT_IMAGE_NAME: electric-sql-ci/electric-ws-client | |||
ELECTRIC_IMAGE_TAG: ${{ env.ELECTRIC_VERSION }} | |||
_OTEL_RESOURCE_ATTRIBUTES: deployment.environment=CI,ci.workflow=${{ github.workflow }},ci.job=${{ github.job }},ci.run=${{ github.run_id }},ci.branch=${{ github.head_ref }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this env var start with an underscore?
In the docs the env var does not have the underscore at the start.
@@ -55,14 +61,18 @@ export class RPC { | |||
public request( | |||
_service: string, | |||
method: string, | |||
message: Uint8Array | |||
message: Uint8Array, | |||
reqOptions?: object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can type this more precisely using the generated SatRpcRequestOptions
type: Partial<SatRpcRequestOptions>
. If you want to make it 100% you'd have to use DeepPartial
as used by protobuf-ts but is not exposed, as a workaround we could use reqOptions?: Parameters<typeof SatRpcRequestOptions.create>[0]
(untested).
/** | ||
* Wrap an RPC instance to augment RPC requests with trace propagation. | ||
* Potenitally slight overkill, could directly add the propagation data | ||
* on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
broken sentence? on what?
* @param rpc RPC instance to wrap | ||
* @returns A proxy around the RPC instance | ||
*/ | ||
export function withRpcRequestTracing(rpc: RPC): RPC { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the proxy is a bit complicated, i like how the tracing logic is separate from the core logic.
@@ -358,6 +368,7 @@ export class SatelliteProcess implements Satellite { | |||
} | |||
|
|||
private async _stop(shutdown?: boolean): Promise<void> { | |||
const span = startSpan('satellite.process.stop') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen if _stop
is called twice and they run concurrently?
i.e. the 2nd invocation would call startSpan
again with the same label, what happens in that case?
Does it create a separate span that is somehow distinct from the first, or does it aggregate them, or does it ignore the 2nd call, or it throws an error, or something else? Tried looking it up but couldn't immediately find an answer in the docs.
@@ -178,6 +178,13 @@ | |||
}, | |||
"dependencies": { | |||
"@electric-sql/prisma-generator": "workspace:*", | |||
"@opentelemetry/api": "^1.9.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, found this in the docs about dependencies:
If you’re instrumenting a library, only install the OpenTelemetry API package for your language. Your library will not emit telemetry on its own. It will only emit telemetry when it is part of an app that uses the OpenTelemetry SDK. For more on instrumenting libraries, see Libraries.
Are we somehow different that we do need all the dependencies? Or could we move them into optional peer dependencies?
} | ||
|
||
/** | ||
* Sets the span status to `ERROR` and optinally records the provided error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in "optinally"
|
||
/** | ||
* Sets the span status to `ERROR` and optinally records the provided error | ||
* ass an exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type in "ass"
.then((_) => span.setStatus({ code: SpanStatusCode.OK })) | ||
.catch((err) => recordSpanError(span, err)) | ||
.finally(() => span.end()) | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leaving a comment here to make sure that we are on purpose returning the original promise and not the chained promise.
@msfstef Can you help me out here with addressing Kevin's feedback? I see there are merge conflicts but haven't rebased so as not to mess up code comments from Kevin. |
This is a follow-up to #1370.
Besides adding OT to the client, here we also implement the passing of client's tracing context to the server in RPC calls across the network boundary. The GH Actions workflows that run E2E tests are updated to enable the export of client traces to Honeycomb.io where each test gets its own
ci.test_file
attribute and a bunch of other CI-related attributes.