Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Datadog 5.41.0 causing high memory consumption #5394

Open
prateekbh opened this issue Mar 12, 2025 · 10 comments · May be fixed by #5476
Open

[BUG]: Datadog 5.41.0 causing high memory consumption #5394

prateekbh opened this issue Mar 12, 2025 · 10 comments · May be fixed by #5476
Labels
bug Something isn't working

Comments

@prateekbh
Copy link

prateekbh commented Mar 12, 2025

Tracer Version(s)

5.41.0

Node.js Version(s)

22 -> docker image sha256:9459e243f620fff19380c51497493e91db1454f5a30847efe5bc5e50920748d5

Bug Report

We've been observing high memory usage from Datadog since Saturday, Mar 8.
The stack runs on Next14 and Node 22 from the docker image mentioned above.

Please look at the following Datadog graphs from before and after 8th march.

9th March
See the timing for "shimmer.js" and anonymousL#57

Image

After pinning version to 5.39.0
See the absence of any significant block from datadog.js

Image

Reproduction Code

No response

Error Logs

No response

Tracer Config

No response

Operating System

No response

Bundling

Next.js

@prateekbh prateekbh added the bug Something isn't working label Mar 12, 2025
@prateekbh prateekbh changed the title [BUG]: Datadog causing high memory consumption [BUG]: Datadog 5.41.0 causing high memory consumption Mar 12, 2025
@MMShep97
Copy link

related discussion on issue right before this one: #5389

@dkindt
Copy link

dkindt commented Mar 18, 2025

We also experienced this. We ended up downgrading to version 5.39.0 last night.

@krydos
Copy link

krydos commented Mar 19, 2025

We experienced the same on 5.42.0 (node version is 22.14.0). We just downgraded back to 5.37.0 (our previous version) and cpu/memory usage looks better (still monitoring it though).

Image

The error our datadog process was spamming with was the same error mentioned by @Grmiade in #5389

2025-03-11 13:49:51 UTC | CORE | ERROR | (comp/dogstatsd/server/server.go:623 in errLog) | Dogstatsd: error parsing metric message '"runtime.node.event_loop.delay.total:0[object Object]|c|#service:<service_name>,version:<version>"': could not parse dogstatsd metric values: strconv.ParseFloat: parsing "0[object Object]": invalid syntax

and it's gone after the downgrade.

We have another service running the same dd-trace 5.42.0 and it doesn't have this error which makes it a bit strange. dd-trace is configured in exactly the same way in both cases.
Node version in this service is 22.11.0 though.

Does any one have any suggestion on how can I debug it so we can have a bit more info on how to solve it?

@MMShep97
Copy link

MMShep97 commented Mar 19, 2025

@rochdev 5.43.0 has been in for a few hours now and doesn't seem to have fixed mem issues JFYI - maybe premature, but definitely looks like the same linear increase

Fix for the missing runtime metrics: #5414
I will ship a release as soon as it's merged, but in the meantime if anyone can try to see if it fixes the memory leak as well, > that would be appreciated. Otherwise I will look into that after the release if it does not fix that second issue.

^ #5389 (comment)

Image bumped to 5.43.0 around 9 am cst

@hjhart
Copy link

hjhart commented Mar 19, 2025

We also have experienced some eratic memory usage, but the CPU usage is a bit more perplexing.

We downgraded to our previous version, 5.42.0, and the problem has gone away.

We deployed 5.42.2 on the night of March 15th.

Image

@royscymulate
Copy link

We are experiencing the same issue in our application. After upgrading to dd-trace-js version 5.41.0, we noticed a significant increase in memory consumption, which wasn't present in previous versions (such as 5.39.0). It appears to escalate over time, leading to potential performance degradation.

Would appreciate any insights from the Datadog team or others experiencing this issue. Has anyone found a workaround besides downgrading?

@mertonium
Copy link

Hey there, just adding some more data to the pile (these graphs span the last 3 days). We're running node 22.14.0 and these graphs are from our very low traffic staging environment.
Image

@rochdev
Copy link
Member

rochdev commented Mar 20, 2025

The only significant change that was released in 5.41.0 specifically was #5126. In order to validate that this is actually the source of the issue, we'd need one of 2 things:

  1. A reproduction snippet, which would definitely be ideal as we can then debug the issue offline.
  2. I have just created a branch that allows turning off the new feature with DD_TRACE_STABLE_CONFIG_ENABLED=false. The branch is stable-config-enabled-env-var. Since the native extension was also updated in the same version, I would also add DD_CRASHTRACKING_ENABLED=false so that the native code is not loaded at all so that we can isolate it completely.

If anyone can provide any of those 2 it would help us tremendously with figuring out the issue as soon as possible.

@MMShep97
Copy link

MMShep97 commented Mar 20, 2025

The following are our env variables inside of our task definition (using aws ecs fargate on a node server). The datadog agent runs as a sidecar and we're using a shared terraform module to run it. LMK if there's something I can provide that would be helpful. Someone mentioned setting DD_RUNTIME_METRICS_ENABLED to false removed the memory leak for them in a previous issue.

Also, to clarify, this only started in 5.41.1 for us. 5.41.0 is fine for us.

    "environment": [
      {
        "name": "DD_ENV",
        "value": "${environment_name}"
      },
      {
        "name": "DD_SERVICE",
        "value": "${project_name}"
      },
      {
        "name": "DD_VERSION",
        "value": "git:${git_commit}"
      },
      {
        "name": "DD_RUNTIME_METRICS_ENABLED",
        "value": "true"
      },
      {
        "name": "DD_IAST_ENABLED",
        "value": "true"
      },
      {
        "name": "DD_APPSEC_ENABLED",
        "value": "true"
      },
      {
        "name": "DD_DATA_STREAMS_ENABLED",
        "value": "true"
      },
      {
        "name": "DD_APM_ENABLED",
        "value": "true"
      },
      {
        "name": "DD_TRACE_ENABLED",
        "value": "true"
      },
      {
        "name": "DD_RUNTIME_METRICS_ENABLED",
        "value": "true"
      },
      {
        "name": "DD_GIT_REPOSITORY_URL",
        "value": "${github_repo_url}"
      },
      {
        "name": "DD_PROFILING_ENABLED",
        "value": "true"
      },
      {
          "name": "DD_DBM_PROPAGATION_MODE",
          "value": "full"
      }
    ],

I am running with DD_CRASHTRACKING_ENABLED=false & 5.43.0 atm (not including the DD_TRACE_STABLE_CONFIG_ENABLED=false portion) -- will monitor how the trend goes. Below is the past 30 minutes its been in.

Image

edit: nope that by itself didn't work

Image

@rochdev
Copy link
Member

rochdev commented Mar 21, 2025

I am running with DD_CRASHTRACKING_ENABLED=false & 5.43.0 atm (not including the DD_TRACE_STABLE_CONFIG_ENABLED=false portion)

@MMShep97 This is only supported on the stable-config-enabled-env-var branch. Can you try from that branch and with both environment variable to false? This will allow us to see if that new feature is the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants