Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(telemetry): add device ID logging COMPASS-8443 #2411

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

gagik
Copy link
Contributor

@gagik gagik commented Mar 19, 2025

To standardize identification across DevTools and Atlas CLI, this introduces a node-machine-id dependency which uses the same base system calls for determining the device Id as https://github.com/denisbrodbeck/machineid that is used for the device ID by the Atlas CLI. The only exception being the lack of BSD fallback of using /etc/hostid and smbios.system.uuid that the Go library uses but the Node one does not; in that case our device ID will likely be undefined or different.

Both libraries use hashing to protect the machine-specific information.

We could using anonymousId as deviceId as done in the Atlas CLI but the concerns are: a) this transition may break many existing user associations we have and b) if for whatever reason device ID cannot be determined on a given OS, we'd lose anonymousId altogether. Therefore this instead adds a new identity field for Segment.

Alternative Considerations

For the purpose of sharing information across tools, we had the following options:

  1. Match 1:1 what Atlas CLI by using node-machine-id library and then hash it in the same way as the machineid go library. I did a quick POC of this and at least on my Mac with some capitalization modifications I was able to reproduce the same device ID hash in both Go and Node, so seems functional. Concerns with this are:
  • The node library isn't actively being maintained though it is also rather simple and used by big projects such as i.e. realm, nx.
  • Depends on spawning child processes that call OS-specific functions which can be troublesome.
  • In case of nicher OS environments which may not provide the system calls it relies on, its behavior may be unpredictable.
  1. An alternative idea would have been to introduce a shared file like ~/mongodb-devtools/config.json where we'd write a random UUID for the device ID and read it across applications whenever possible. It's less reliant on OS though has the issue of an application having access to read files from that directory (and easier access for the user to modify/mess with it; though they can do so with the machine ID as well if they wanted to).

Overall 1 seems best for now as

  1. Atlas CLI already uses machine IDs so it'd be quicker and less work for all tools to align this way; having spoken with them, they also considered the latter but went with the machine ID because of the mentioned file access and modification concerns.
  2. Machine ID seems functional enough as it is and in cases where it'd fail we'd simply lose or have an invalid device ID field; which doesn't seem like a big loss.

To Do:

  • Verify the hashed device ID is the same as the Atlas Device ID.

@gagik gagik requested a review from addaleax March 19, 2025 13:17
@@ -22,7 +22,8 @@
"@mongosh/history": "2.4.6",
"@mongosh/types": "3.6.0",
"mongodb-log-writer": "^2.3.1",
"mongodb-redact": "^1.1.5"
"mongodb-redact": "^1.1.5",
"node-machine-id": "^1.1.12"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unmaintained package which always runs multiple child processes to get its result, I do wonder if there are better alternatives out there, even if it's just hashing os.hostname() and maybe the mac addresses reported in os.networkInterfaces()?

Copy link
Contributor Author

@gagik gagik Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree, I wasn't sure about the package though it is used by some pretty big projects and with no good alternatives. If we go in a different direction we'd likely want the Atlas CLI to adopt the same as well but considering the library talks about lack of reliability of Mac addresses etc., seem like it won't be that reliable.

I'm wondering if we could instead do a shared mongodb devtools like ~/.mongodb-tools/config.yaml directory or something where we write a new UUID for anonymousId and let mongosh, Compass, and Atlas CLI read and use this if it exists already. Seems like hoping all tools can reliably come up with the same device identifier is more unreliable anyhow.

Copy link
Contributor Author

@gagik gagik Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha just realized the "big projects which use this" includes https://www.npmjs.com/package/realm?activeTab=dependencies 😅

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we could instead do a shared mongodb devtools like ~/.mongodb-tools/config.yaml directory or something where we write a new UUID for anonymousId and let mongosh, Compass, and Atlas CLI read and use this if it exists already.

If that is an option, we should definitely go with it. I had assumed we'd want a true "device" ID, not something that would be per-user, but in today's world the latter is probably at least as good as the former. (It also has the advantage of the user being able to control it, if necessary.)

Seems like hoping all tools can reliably come up with the same device identifier is more unreliable anyhow.

Agreed 👍

Copy link
Contributor Author

@gagik gagik Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll ask analytics team, seems like a cleaner option; I think our motivation is to be able to associate the user between our various tools rather than actually identify a device. The only concern is maybe whether an application can always write/read that directory, wherever we decide to put that across OSes

@gagik gagik force-pushed the gagik/add-device-id branch from f83cffc to 09eca29 Compare March 24, 2025 11:41
@gagik gagik force-pushed the gagik/add-device-id branch 3 times, most recently from 0413413 to 0a9e5a0 Compare March 25, 2025 09:37
@gagik gagik force-pushed the gagik/add-device-id branch from 0a9e5a0 to 95d93a5 Compare March 25, 2025 09:41
@gagik
Copy link
Contributor Author

gagik commented Mar 25, 2025

@addaleax I ended up sticking with device ID for reasons I mentioned in Alternative Considerations in the PR description, mainly easier adaptability from existing Atlas CLI data.

I also re-organized the logic so now we have 2 parallel buffers for bus events in general as well as telemetry events in particular until the device ID is resolved. Let me know if you have any thoughts about either of that.

try {
this.deviceId ??= await Promise.race([
getDeviceId(),
new Promise<string>((resolve) => {
Copy link
Contributor Author

@gagik gagik Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be necessary I am just not sure if there's an issue with leaving the getDeviceId() "running" while we're flushing the events; does this get killed? My guess is it this doesn't matter but don't want to end up delaying the shell exit because of this.

@@ -1206,11 +1207,14 @@ export class CliRepl implements MongoshIOProvider {
* @param code The user-provided exit code, if any.
*/
async exit(code?: number): Promise<never> {
this.loggingAndTelemetry?.flush();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be doing this generally in any case to make sure all telemetry (including the error afterwards?) get reported even before this change

@gagik gagik force-pushed the gagik/add-device-id branch 2 times, most recently from 26df86b to 3dda787 Compare March 25, 2025 12:44
@gagik gagik requested a review from addaleax March 26, 2025 09:39
@gagik gagik force-pushed the gagik/add-device-id branch from 3dda787 to 7484681 Compare March 26, 2025 10:04
Copy link
Collaborator

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ... I don't want to slow down work that I know we want to just get done¹, but I do still feel pretty uneasy about using node-machine-id, the fact that we're spawning multiple extra child processes on each startup of mongosh just feels fairly wrong, from a performance¹ and security perspective, even if we don't see an immediate critical issue with it.

Can we try to align even more closely with the Atlas CLI and use the same approach, which is to essentially perform the lookups in native/compiled code? I know that that comes with a bit of overhead, but it's far from infeasible (we've done that for other smaller things as well, like https://github.com/mongodb-js/glibc-version) and I'm happy to help with getting that off the ground.

¹ Telemetry is still turned off for mongosh, so it's probably not super time-sensitive?
² Startup performance has been a big pain point in the past for us and even though the perf tests in CI seem okay here, this change feels a bit like we'd be pushing it 😞

@gagik
Copy link
Contributor Author

gagik commented Mar 26, 2025

@addaleax Sounds good, I can look into ways we could put this into the larger plan and see what we could come up with. Sorry, might have been meaningful to have more discussion about this earlier, I just figured after talking with Atlas CLI folk that we'd likely end up in some form of machine ID-powered setup anyways unless we'd like to push them to adopt the alternative instead. And potential of having to deal with directory permissions or whatnot seemed like a good enough argument against the shared directory idea.

But yeah about node-machine-id and performance concerns, that makes a lot of sense. The native lookup does sound worthwhile and hopefully not much effort from our end (honestly surprised this is something that doesn't exist already). I'll follow-up about that and regarding expected timeline.

@addaleax
Copy link
Collaborator

But yeah about node-machine-id and performance concerns, that makes a lot of sense. The native lookup does sound worthwhile and hopefully not much effort from our end (honestly surprised this is something that doesn't exist already). I'll follow-up about that and regarding expected timeline.

Yeah, I'm also happy to support this in any way I can, overall I'd expect it to be somewhat straightforward to put together

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants