Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emergency memory cleanup of harvester #974

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

nr-swilloughby
Copy link
Contributor

This is a feature addition I was holding back until I got confirmation that it was in fact the needed solution, but I think it may be best to go ahead and put it in as a default-off feature anyway to provide a means to mitigate a memory issue if one emerges with no other apparent cause or solution available and/or as a quick stop-gap solution for the customer until a better solution can be found.

This came out of the work on Issue #906, which was a reported memory leak apparently due to the agent holding onto log data longer than normal, but this only happens to this one application for this one customer under this one set of circumstances (a kubernetes environment) where memory constraints are a real issue.

Since one possibility for why it could be the case that log event data is retained longer than harvest cycles is a problem delivering them to the New Relic back-end collector (since the agent will wait for them to be delivered before dumping them), that might be a situation where a network issue or other external problem could indirectly cause the instrumented application to grow its memory too large to be viable.

And above all, we never want instrumentation of an application to unduly affect the operation of that application itself, so it stands to reason that if an application reaches that point where there seems like no other alternative, we should discard the accumulated event data in the harvester so the app can continue running.

This PR introduces an API call to allow the application to set a maximum heap size for the application. If it exceeds that value, all the harvester's data will be dropped and an emergency garbage collection and memory release will be requested. See the documentation for the function in the deltas for the PR for more details.

@nr-swilloughby
Copy link
Contributor Author

I think we should look at whether we want to allow more control over what memory is released here, since the only case we've found so far seems to be caused by memory issues outside the agent itself, and we're just providing a tool to help an application let go of resources to avoid a worse problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant