-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory leak in PMX where Meter class retains object ExponentiallyMovingWeightedAverage and grows to substantial size (heapdump screens included) #23
Comments
Thanks for inspecting, https://github.com/keymetrics/pmx/blob/master/lib/utils/EWMA.js does not look to retain any reference How do you know that it comes from EWMA? |
I can provide the heapdump snapshots to you, just pm me. Its a production box. |
So, you aren't using any custom probes ? |
Yes. No custom probes. I do have one action to take a heapdump. The other odd thing is keymetrics shows me more memory usage than the box has in total. See screenshot with 36GB [Collaborator Edit] => removed screenshot link because it contains your keys |
Thanks for the remove. Didn't know keys were in there. Let me know if I need to post again or if you were able to get. I trashed mine so will probably have to recreate. The jist was keymetrics added up to 36GB, but my box only has 30GB total. I know you guys are swamped. I am more than happy to help or provide more testing. I love PM2. We are beginning to rely on it heavily. Just point me in the right direction. |
I did more testing on this using node-inspector. I can reproduce the leak each time. I was able to resolve the leak by adding pmx to my client library and setting the below options. require('pmx').init({ I bet the leak is likely on http, however, I disabled for all of them. I will update once I have more time to narrow it down. |
Hey, can you tell for sure if it's http or custom_probes or errors ? (These are the only flags true by default, network and ports are false by default). And can you try with [email protected] ? I changed a bit the way how http is overloaded but I can't tell for sure if it fixes the memory leak. Thanks again for your contribution :) |
It's for sure http. I have some more node inspector dumps. I think it I tried to see where to edit but didn't have enough time. New relic has On Tuesday, May 26, 2015, Joni Shkurti [email protected] wrote:
|
Will give it a shot later today with new version. On Tuesday, May 26, 2015, Patrick Hogan [email protected] wrote:
|
I've been testing a bit with this sample code : var pmx = require('pmx').init({
network: true,
ports: true
});
var http = require('http');
setInterval(function keepAlive() {
global.gc();
}, 500);
http.createServer(function(req, res) {
res.end('hello world');
}).listen(8080); (launch it with And I flooded the server with requests via I wasn't able to reproduce the memory leak :/ |
Can you give me a sample code which triggers the leak please ? |
Sure. It relies on some internal dependencies. Give me some time to clean it up. Did you use node .12.x? Environmentally the first screenshots (7 days ago) I gave you that started this ticket were in production. I was running pm2 cluster and no pmx inside. I was able to get rid of the memory increase by setting pmx.init({http: false}) inside of my production code. Can you explain a little bit more how pm2 and pmx interact? It looks to me from the code that pm2 has default options to communicate with keymetrics if pmx is not defined and then (if) pmx is included it will send an rpc command to pm2 to modify those default options. The second screenshots (2 days ago) I gave you were running through node-inspector so I wasn't able to run through pm2. I explicitly set options via pmx to imitate what pm2 would do. I just want to make sure I am comparing apples to apples. Will code that is not using pm2, and just using pmx act the same as a code that is being run by pm2 (after keymetrics initialization)? |
pmx is injected automatically when an app is launched in cluster mode with default options https://github.com/keymetrics/pmx/blob/master/lib/index.js#L32-L42 Did you use a framework like express, koa, sails ? |
I used [email protected] and [email protected] |
I used this sample : https://github.com/jshkurti/benchmark |
Can you give it a try with [email protected] please ? :) |
And/or give us a sample code which triggers the leak. |
@askhogan :) |
Sorry man. I got swamped over the last few days. I will try this, this week. |
Thanks :D |
Installed 0.3.16 New settings require('pmx').init({ You can look at my metrics in keymetrics to see difference. With all of it off over the last few weeks memory usage has been stable at around 2.4GB. |
Grew from consistent 2.4GB mem usage with pmx off to now 7GB (< 200MB -> around 500MB per worker) in a few hours with it on. The only code change was the additions I noted above. I gotta shut this off for now and just use keymetrics without the http tracking turned on (still super awesome without this on BTW!!). This server processes lots and lots of messages. When the heap gets big the garbage collection starts eating a lot of CPU and I get weird latency on batches of messages. I know you are having trouble re-creating it. Let me know how I can help. |
Pause on my comments for a bit. I just realized the npm module didn't update. Really weird. Let me re-test. That was all done on .3.11 |
I had a local version at .3.11 and a global version inside pm2 at .3.16.... I updated the global pm2 version earlier. I now just updated the local version as well. Re-testing... |
Thanks for reporting, We have some questions:
If you find the time, the best would be to have small subset of your application to reproduce the leak ourselves. Thanks! |
Hi @askhogan :) |
No Sails. I did have express, which Sails is based on. |
I am experiencing the same issue. Using PM2 to run express web server in NodeJS 0.12.7, PM2 0.14.7, express 4.12.4. When running in fork mode, there is no memory leak. When running in cluster mode, this same memory leak is seen. I don't use keyketrics for monitoring, so we are now using the --no-pmx flag and that fixes the issue. |
Same issue here. I'm running a queue worker using aws-sqs, the code runs fine without pmx, memory is stable. When pmx has:
memory keeps rising. Inbound connections seems to be less of a problem, outbound connections create more problems. Also in my heapdumps I find a lot of (really smallweight) references to TLSWrap, TLSSocket, clearTimeout and other connection related stuff. It's like the http functionality is holding some data that stays hanging and don't get garbage collected. I've tested with a minimal setup that simply iterates over some empty queues and memory keeps rising nonetheless. Tested on:
|
@andreafdaf Can you share with us your minimal setup that show this leak ? |
@vmarchaud I can create a github repo if you want, is it ok? I can work on it in the next couple of days, the files I'm using contain some private/corporate data so they cannot be shared directly. With a minimal setup memory grows veeery slowly but it's noticeable after a few hours. An example of minimal setup can be something like that:
|
@andreafdaf I understood that you had a way to reproduce a fast memory leak, but if its just a simple as your example i will try to start an application and let it run over days to look for memory leak. |
@vmarchaud if you want it to grow faster just set the timeout to 100 and wait a bit, the behavior is noticeable even after 10 mins. After a while GC starts eating the CPU even with such a minimal setup.
and then it goes on and on and on (this is just a sample, GC recovers some memory every turn but not enough and overall it grows to infinity) |
Just wanted to give you guys an heads up: I'm also experiencing a memory leak on https://npms.io/ 's analyzer that has a scoring process that runs continuously and does a lot of HTTP requests (to couchdb and elasticsearch). After just 10m running the analyzer's scoring, it ends up with ~2GB of ram and eventually crashes with out of memory. Disabling pmx fixes the problem but I really liked the pmx integration with keymetrics. :/ |
@satazor Thanks for the report, could you come over http://slack.pm2.io to discuss about this ? |
An update regarding my previous comment: I and @vmarchaud went together on a small journey to understand the leak and we come to the following conclusion: I had a subtle leak on my code that was being "augmented" by pmx due to their error wrapping. I'm using bluebird with long stack traces enabled which, after looking at bluebird's code, creates a lot of Error objects to keep track of the promises stack (even if a promises resolve successfully). All those error objects were being handled by pmx which in turn generated more memory. The leak was already there, but pmx made it more evident because of how it interoperates with bluebird. Thanks @vmarchaud and hope that my findings help someone. |
I am not sure if this is related to the leak, but it is something in the http monitoring - |
Did you even find anything for @andreafdaf's example? You said you would look into it, but the conversation fell flat right after that post... |
no response? are you implying I should remove pmx..? |
@dreaganluna I'm exactly in your same situation and had to remove HTTP monitoring |
I have the same issue. For me as well the code runs fine without pmx, memory is stable. But with pmx with http:true it just keeps on increasing. pmx settings used:
Versions :- Any updates on this open issue will help. @Unitech @askhogan |
We had to remove pmx because of memory leaks. With our server under load it was retaining hundreds of thousands of stack traces (and maybe other strings?) in memory, each at 2k or so. Incidentally, their heap snapshotting tool also doesn't work reliably from the UI and fails via the CLI on modern versions of node. At this stage I can't be sure that their heap snapshotting tool doesn't have a memory leak itself. A carnival funhouse! Support told me that they were aware of the problem and planned to fix it sometime in the uncertain future. Given their lack of response/fix here to a critical issue (and to lots of other issues we've raised over the last few years), well... Abandon hope all ye who enter here |
We encountered the same problem. Our API's run on limited resources and after 1 week from each time we restart them, our service starts becoming really slow with a huge increase in CPU/mem usage. heap snapshots while running locally ( take a look at the #Delta's of the first picture) :pmx: v1.6.7 pmx config:
|
See screenshots. This is a continued and sharp rise in use over 8 hour period. The box has consistent processing flow.
The leak appears to be from the global include and extension of var Probe = require('./Probe.js');
See heapsnapshots. This is an 8 hour period of time.
The text was updated successfully, but these errors were encountered: