-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(telemetry): add device ID logging COMPASS-8443 #2411
base: main
Are you sure you want to change the base?
Conversation
@@ -22,7 +22,8 @@ | |||
"@mongosh/history": "2.4.6", | |||
"@mongosh/types": "3.6.0", | |||
"mongodb-log-writer": "^2.3.1", | |||
"mongodb-redact": "^1.1.5" | |||
"mongodb-redact": "^1.1.5", | |||
"node-machine-id": "^1.1.12" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unmaintained package which always runs multiple child processes to get its result, I do wonder if there are better alternatives out there, even if it's just hashing os.hostname()
and maybe the mac addresses reported in os.networkInterfaces()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I agree, I wasn't sure about the package though it is used by some pretty big projects and with no good alternatives. If we go in a different direction we'd likely want the Atlas CLI to adopt the same as well but considering the library talks about lack of reliability of Mac addresses etc., seem like it won't be that reliable.
I'm wondering if we could instead do a shared mongodb devtools like ~/.mongodb-tools/config.yaml
directory or something where we write a new UUID for anonymousId and let mongosh, Compass, and Atlas CLI read and use this if it exists already. Seems like hoping all tools can reliably come up with the same device identifier is more unreliable anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha just realized the "big projects which use this" includes https://www.npmjs.com/package/realm?activeTab=dependencies 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we could instead do a shared mongodb devtools like
~/.mongodb-tools/config.yaml
directory or something where we write a new UUID for anonymousId and let mongosh, Compass, and Atlas CLI read and use this if it exists already.
If that is an option, we should definitely go with it. I had assumed we'd want a true "device" ID, not something that would be per-user, but in today's world the latter is probably at least as good as the former. (It also has the advantage of the user being able to control it, if necessary.)
Seems like hoping all tools can reliably come up with the same device identifier is more unreliable anyhow.
Agreed 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll ask analytics team, seems like a cleaner option; I think our motivation is to be able to associate the user between our various tools rather than actually identify a device. The only concern is maybe whether an application can always write/read that directory, wherever we decide to put that across OSes
f83cffc
to
09eca29
Compare
0413413
to
0a9e5a0
Compare
0a9e5a0
to
95d93a5
Compare
@addaleax I ended up sticking with device ID for reasons I mentioned in Alternative Considerations in the PR description, mainly easier adaptability from existing Atlas CLI data. I also re-organized the logic so now we have 2 parallel buffers for bus events in general as well as telemetry events in particular until the device ID is resolved. Let me know if you have any thoughts about either of that. |
try { | ||
this.deviceId ??= await Promise.race([ | ||
getDeviceId(), | ||
new Promise<string>((resolve) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be necessary I am just not sure if there's an issue with leaving the getDeviceId()
"running" while we're flushing the events; does this get killed? My guess is it this doesn't matter but don't want to end up delaying the shell exit because of this.
packages/cli-repl/src/cli-repl.ts
Outdated
@@ -1206,11 +1207,14 @@ export class CliRepl implements MongoshIOProvider { | |||
* @param code The user-provided exit code, if any. | |||
*/ | |||
async exit(code?: number): Promise<never> { | |||
this.loggingAndTelemetry?.flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be doing this generally in any case to make sure all telemetry (including the error afterwards?) get reported even before this change
26df86b
to
3dda787
Compare
3dda787
to
7484681
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ... I don't want to slow down work that I know we want to just get done¹, but I do still feel pretty uneasy about using node-machine-id, the fact that we're spawning multiple extra child processes on each startup of mongosh just feels fairly wrong, from a performance¹ and security perspective, even if we don't see an immediate critical issue with it.
Can we try to align even more closely with the Atlas CLI and use the same approach, which is to essentially perform the lookups in native/compiled code? I know that that comes with a bit of overhead, but it's far from infeasible (we've done that for other smaller things as well, like https://github.com/mongodb-js/glibc-version) and I'm happy to help with getting that off the ground.
¹ Telemetry is still turned off for mongosh, so it's probably not super time-sensitive?
² Startup performance has been a big pain point in the past for us and even though the perf tests in CI seem okay here, this change feels a bit like we'd be pushing it 😞
@addaleax Sounds good, I can look into ways we could put this into the larger plan and see what we could come up with. Sorry, might have been meaningful to have more discussion about this earlier, I just figured after talking with Atlas CLI folk that we'd likely end up in some form of machine ID-powered setup anyways unless we'd like to push them to adopt the alternative instead. And potential of having to deal with directory permissions or whatnot seemed like a good enough argument against the shared directory idea. But yeah about |
Yeah, I'm also happy to support this in any way I can, overall I'd expect it to be somewhat straightforward to put together |
To standardize identification across DevTools and Atlas CLI, this introduces a
node-machine-id
dependency which uses the same base system calls for determining the device Id as https://github.com/denisbrodbeck/machineid that is used for the device ID by the Atlas CLI. The only exception being the lack of BSD fallback of using/etc/hostid
andsmbios.system.uuid
that the Go library uses but the Node one does not; in that case our device ID will likely be undefined or different.Both libraries use hashing to protect the machine-specific information.
We could using
anonymousId
asdeviceId
as done in the Atlas CLI but the concerns are: a) this transition may break many existing user associations we have and b) if for whatever reason device ID cannot be determined on a given OS, we'd loseanonymousId
altogether. Therefore this instead adds a new identity field for Segment.Alternative Considerations
For the purpose of sharing information across tools, we had the following options:
node-machine-id
library and then hash it in the same way as themachineid
go library. I did a quick POC of this and at least on my Mac with some capitalization modifications I was able to reproduce the same device ID hash in both Go and Node, so seems functional. Concerns with this are:realm
,nx
.~/mongodb-devtools/config.json
where we'd write a random UUID for the device ID and read it across applications whenever possible. It's less reliant on OS though has the issue of an application having access to read files from that directory (and easier access for the user to modify/mess with it; though they can do so with the machine ID as well if they wanted to).Overall 1 seems best for now as
To Do: