feat(telemetry): add device ID logging COMPASS-8443 #2411

gagik · 2025-03-19T13:09:19Z

To standardize identification across DevTools and Atlas CLI, this introduces a node-machine-id dependency which uses the same base system calls for determining the device Id as https://github.com/denisbrodbeck/machineid that is used for the device ID by the Atlas CLI. The only exception being the lack of BSD fallback of using /etc/hostid and smbios.system.uuid that the Go library uses but the Node one does not; in that case our device ID will likely be undefined or different.

Both libraries use hashing to protect the machine-specific information.

We could using anonymousId as deviceId as done in the Atlas CLI but the concerns are: a) this transition may break many existing user associations we have and b) if for whatever reason device ID cannot be determined on a given OS, we'd lose anonymousId altogether. Therefore this instead adds a new identity field for Segment.

Alternative Considerations

For the purpose of sharing information across tools, we had the following options:

Match 1:1 what Atlas CLI by using node-machine-id library and then hash it in the same way as the machineid go library. I did a quick POC of this and at least on my Mac with some capitalization modifications I was able to reproduce the same device ID hash in both Go and Node, so seems functional. Concerns with this are:

The node library isn't actively being maintained though it is also rather simple and used by big projects such as i.e. realm, nx.
Depends on spawning child processes that call OS-specific functions which can be troublesome.
In case of nicher OS environments which may not provide the system calls it relies on, its behavior may be unpredictable.

An alternative idea would have been to introduce a shared file like ~/mongodb-devtools/config.json where we'd write a random UUID for the device ID and read it across applications whenever possible. It's less reliant on OS though has the issue of an application having access to read files from that directory (and easier access for the user to modify/mess with it; though they can do so with the machine ID as well if they wanted to).

Overall 1 seems best for now as

Atlas CLI already uses machine IDs so it'd be quicker and less work for all tools to align this way; having spoken with them, they also considered the latter but went with the machine ID because of the mentioned file access and modification concerns.
Machine ID seems functional enough as it is and in cases where it'd fail we'd simply lose or have an invalid device ID field; which doesn't seem like a big loss.

To Do:

Verify the hashed device ID is the same as the Atlas Device ID.

packages/logging/src/logging-and-telemetry.ts

addaleax · 2025-03-20T13:26:10Z

packages/logging/package.json

@@ -22,7 +22,8 @@
    "@mongosh/history": "2.4.6",
    "@mongosh/types": "3.6.0",
    "mongodb-log-writer": "^2.3.1",
-    "mongodb-redact": "^1.1.5"
+    "mongodb-redact": "^1.1.5",
+    "node-machine-id": "^1.1.12"


This is an unmaintained package which always runs multiple child processes to get its result, I do wonder if there are better alternatives out there, even if it's just hashing os.hostname() and maybe the mac addresses reported in os.networkInterfaces()?

yeah I agree, I wasn't sure about the package though it is used by some pretty big projects and with no good alternatives. If we go in a different direction we'd likely want the Atlas CLI to adopt the same as well but considering the library talks about lack of reliability of Mac addresses etc., seem like it won't be that reliable.

I'm wondering if we could instead do a shared mongodb devtools like ~/.mongodb-tools/config.yaml directory or something where we write a new UUID for anonymousId and let mongosh, Compass, and Atlas CLI read and use this if it exists already. Seems like hoping all tools can reliably come up with the same device identifier is more unreliable anyhow.

haha just realized the "big projects which use this" includes https://www.npmjs.com/package/realm?activeTab=dependencies 😅

I'm wondering if we could instead do a shared mongodb devtools like ~/.mongodb-tools/config.yaml directory or something where we write a new UUID for anonymousId and let mongosh, Compass, and Atlas CLI read and use this if it exists already.

If that is an option, we should definitely go with it. I had assumed we'd want a true "device" ID, not something that would be per-user, but in today's world the latter is probably at least as good as the former. (It also has the advantage of the user being able to control it, if necessary.)

Seems like hoping all tools can reliably come up with the same device identifier is more unreliable anyhow.

Agreed 👍

I'll ask analytics team, seems like a cleaner option; I think our motivation is to be able to associate the user between our various tools rather than actually identify a device. The only concern is maybe whether an application can always write/read that directory, wherever we decide to put that across OSes

.depalignrc.json

gagik · 2025-03-25T09:42:48Z

@addaleax I ended up sticking with device ID for reasons I mentioned in Alternative Considerations in the PR description, mainly easier adaptability from existing Atlas CLI data.

I also re-organized the logic so now we have 2 parallel buffers for bus events in general as well as telemetry events in particular until the device ID is resolved. Let me know if you have any thoughts about either of that.

gagik · 2025-03-25T10:11:25Z

packages/logging/src/logging-and-telemetry.ts

+      try {
+        this.deviceId ??= await Promise.race([
+          getDeviceId(),
+          new Promise<string>((resolve) => {


This may not be necessary I am just not sure if there's an issue with leaving the getDeviceId() "running" while we're flushing the events; does this get killed? My guess is it this doesn't matter but don't want to end up delaying the shell exit because of this.

gagik · 2025-03-25T10:12:39Z

packages/cli-repl/src/cli-repl.ts

@@ -1206,11 +1207,14 @@ export class CliRepl implements MongoshIOProvider {
   * @param code The user-provided exit code, if any.
   */
  async exit(code?: number): Promise<never> {
+    this.loggingAndTelemetry?.flush();


I think we should be doing this generally in any case to make sure all telemetry (including the error afterwards?) get reported even before this change

packages/logging/src/logging-and-telemetry.ts

addaleax

So ... I don't want to slow down work that I know we want to just get done¹, but I do still feel pretty uneasy about using node-machine-id, the fact that we're spawning multiple extra child processes on each startup of mongosh just feels fairly wrong, from a performance¹ and security perspective, even if we don't see an immediate critical issue with it.

Can we try to align even more closely with the Atlas CLI and use the same approach, which is to essentially perform the lookups in native/compiled code? I know that that comes with a bit of overhead, but it's far from infeasible (we've done that for other smaller things as well, like https://github.com/mongodb-js/glibc-version) and I'm happy to help with getting that off the ground.

¹ Telemetry is still turned off for mongosh, so it's probably not super time-sensitive?
² Startup performance has been a big pain point in the past for us and even though the perf tests in CI seem okay here, this change feels a bit like we'd be pushing it 😞

packages/logging/src/logging-and-telemetry.ts

gagik · 2025-03-26T18:59:36Z

@addaleax Sounds good, I can look into ways we could put this into the larger plan and see what we could come up with. Sorry, might have been meaningful to have more discussion about this earlier, I just figured after talking with Atlas CLI folk that we'd likely end up in some form of machine ID-powered setup anyways unless we'd like to push them to adopt the alternative instead. And potential of having to deal with directory permissions or whatnot seemed like a good enough argument against the shared directory idea.

But yeah about node-machine-id and performance concerns, that makes a lot of sense. The native lookup does sound worthwhile and hopefully not much effort from our end (honestly surprised this is something that doesn't exist already). I'll follow-up about that and regarding expected timeline.

addaleax · 2025-03-27T15:02:57Z

But yeah about node-machine-id and performance concerns, that makes a lot of sense. The native lookup does sound worthwhile and hopefully not much effort from our end (honestly surprised this is something that doesn't exist already). I'll follow-up about that and regarding expected timeline.

Yeah, I'm also happy to support this in any way I can, overall I'd expect it to be somewhat straightforward to put together

gagik commented Mar 19, 2025

View reviewed changes

packages/logging/src/logging-and-telemetry.ts Outdated Show resolved Hide resolved

gagik commented Mar 19, 2025

View reviewed changes

packages/logging/src/logging-and-telemetry.ts Outdated Show resolved Hide resolved

gagik requested a review from addaleax March 19, 2025 13:17

addaleax reviewed Mar 20, 2025

View reviewed changes

gagik added 3 commits March 24, 2025 12:35

feat(telemetry): add device ID logging

5df10b6

feat: align with Atlas CLI hashing and resolve device ID asynchronously

dcdec0b

fix: ignore node-fetch depalign

09eca29

gagik force-pushed the gagik/add-device-id branch from f83cffc to 09eca29 Compare March 24, 2025 11:41

fix: add sinon dependency

cf07e34

gagik commented Mar 24, 2025

View reviewed changes

.depalignrc.json Show resolved Hide resolved

fix: allow multiple identify calls for device ID

e6fa06e

gagik force-pushed the gagik/add-device-id branch 3 times, most recently from 0413413 to 0a9e5a0 Compare March 25, 2025 09:37

refactor: delay telemetry until device ID is resolved

95d93a5

gagik force-pushed the gagik/add-device-id branch from 0a9e5a0 to 95d93a5 Compare March 25, 2025 09:41

gagik commented Mar 25, 2025

View reviewed changes

packages/logging/src/logging-and-telemetry.ts Show resolved Hide resolved

gagik force-pushed the gagik/add-device-id branch 2 times, most recently from 26df86b to 3dda787 Compare March 25, 2025 12:44

gagik requested a review from addaleax March 26, 2025 09:39

fix: flush events when exiting cli-repl

7484681

gagik force-pushed the gagik/add-device-id branch from 3dda787 to 7484681 Compare March 26, 2025 10:04

addaleax reviewed Mar 26, 2025

View reviewed changes

packages/logging/src/logging-and-telemetry.ts Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telemetry): add device ID logging COMPASS-8443 #2411

feat(telemetry): add device ID logging COMPASS-8443 #2411

gagik commented Mar 19, 2025 •

edited

Loading

addaleax Mar 20, 2025

gagik Mar 20, 2025 •

edited

Loading

gagik Mar 20, 2025 •

edited

Loading

addaleax Mar 20, 2025

gagik Mar 20, 2025 •

edited

Loading

gagik commented Mar 25, 2025

gagik Mar 25, 2025 •

edited

Loading

gagik Mar 25, 2025

addaleax left a comment

gagik commented Mar 26, 2025

addaleax commented Mar 27, 2025

feat(telemetry): add device ID logging COMPASS-8443 #2411

Are you sure you want to change the base?

feat(telemetry): add device ID logging COMPASS-8443 #2411

Conversation

gagik commented Mar 19, 2025 • edited Loading

Alternative Considerations

addaleax Mar 20, 2025

Choose a reason for hiding this comment

gagik Mar 20, 2025 • edited Loading

Choose a reason for hiding this comment

gagik Mar 20, 2025 • edited Loading

Choose a reason for hiding this comment

addaleax Mar 20, 2025

Choose a reason for hiding this comment

gagik Mar 20, 2025 • edited Loading

Choose a reason for hiding this comment

gagik commented Mar 25, 2025

gagik Mar 25, 2025 • edited Loading

Choose a reason for hiding this comment

gagik Mar 25, 2025

Choose a reason for hiding this comment

addaleax left a comment

Choose a reason for hiding this comment

gagik commented Mar 26, 2025

addaleax commented Mar 27, 2025

gagik commented Mar 19, 2025 •

edited

Loading

gagik Mar 20, 2025 •

edited

Loading

gagik Mar 20, 2025 •

edited

Loading

gagik Mar 20, 2025 •

edited

Loading

gagik Mar 25, 2025 •

edited

Loading