Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that system metrics are container-aware #1246

Open
4 tasks
ivantopo opened this issue Jan 12, 2023 · 0 comments
Open
4 tasks

Ensure that system metrics are container-aware #1246

ivantopo opened this issue Jan 12, 2023 · 0 comments
Assignees

Comments

@ivantopo
Copy link
Contributor

ivantopo commented Jan 12, 2023

We started the system metrics collector back in the day when applications ran on dedicated hosts! Now, almost everything runs inside containers and we should ensure that Kamon does its best in those cases. This means:

  • Fully implement process-specific metrics. Currently, we are only reporting CPU usage but it would be great to include memory and IO (network and disk) if possible
  • Automatically disable host metrics when running inside a container. Host metrics could (and probably) should be collected by other means when the host is meant to run containers. We should have a setting to force enabling host metrics in case folks want to have it anyways
  • Figure out if there is a way to use OSHI in read-only containers. We are unpacking the native libraries at runtime and loading them, but that requires a writable file system. I'm sure there is a way around this, just need time to investigate and share
  • As much as possible, align the metric/tag names with the ones in the OpenTelemetry semantic conventions for system and process metrics

Notes from early investigation:

  • We could "hack" reading the cgroup memory limits by reading /sys/fs/cgroup/memory.max (see Feature Request: expose cgroup memory limits when running in Docker oshi/oshi#893 for more info). In my local tests, the file contains the proper memory limit when a container is started with a memory limit, but contains max when there are no limits. In that case we might fallback to the global memory limit. This was a local docker container, though. I'll test further in EKS to see how it goes there.
  • So far, I'm not aware of any reliable way to detect whether the application is running inside a container or not. This needs research.
@ivantopo ivantopo self-assigned this Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant