Skip to content

Commit

Permalink
16417 FIX Add Configuration Option 'checkmkAgentTimeout'
Browse files Browse the repository at this point in the history
CMK-16676

Closes: #26

Change-Id: I697df13efc1d6b5279396d626b0c335c51928892
  • Loading branch information
SoloJacobs committed Apr 8, 2024
1 parent 07d9e0a commit 5dcabf7
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 3 deletions.
24 changes: 24 additions & 0 deletions .werks/16417
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Title: Add Configuration Option 'checkmkAgentTimeout'
Class: fix
Compatible: compat
Component: node-collector
Date: 1712152033
Edition: cre
Knowledge: doc
Level: 1
State: unknown
Version: 1.5.0

The machine-sections-collector executes a version of the 'check_mk_agent' to collect information
about the host. Sometimes this script takes more than five seconds, which causes the following
traceback.

C+:
File "/usr/local/lib/python3.10/subprocess.py", line 1935, in _wait
raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['/usr/local/bin/check_mk_agent']' timed out after 5 seconds
C-:

If you encounter this error, you can configure a longer timeout via the new option
'nodeCollector.machineSectionsCollector.checkmkAgentTimeout' in the 'values.yaml' configuration
file.
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,9 @@ spec:
- "/usr/local/bin/checkmk-machine-sections-collector"
args:
- "--log-level={{ .Values.nodeCollector.logLevel }}"
{{- if .Values.nodeCollector.machineSectionsCollector.checkmkAgentTimeout }}
- "--checkmk-agent-timeout={{ .Values.nodeCollector.machineSectionsCollector.checkmkAgentTimeout }}"
{{- end }}
{{- if .Values.tlsCommunication.enabled }}
- "--secure-protocol"
{{- if .Values.tlsCommunication.verifySsl }}
Expand Down
2 changes: 2 additions & 0 deletions deploy/charts/checkmk/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,8 @@ nodeCollector:
cpu: 150m
memory: 200Mi

checkmkAgentTimeout: 5

# the machine sections collector can collect monitoring information for network interfaces of the underlying node.
# this means that the '/sys' directory of the node will be mounted into the container.
# the pod security policy is adjusted accordingly.
Expand Down
6 changes: 3 additions & 3 deletions src/checkmk_kube_agent/send_metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ def parse_arguments(argv: Sequence[str]) -> argparse.Namespace:
help="Collector log level.",
)
parser.add_argument(
"--agent-timeout",
"--checkmk-agent-timeout",
type=int,
help="Checkmk Agent execution timeout in seconds",
)
Expand All @@ -311,7 +311,7 @@ def parse_arguments(argv: Sequence[str]) -> argparse.Namespace:
max_retries=10,
polling_interval=60,
ca_cert="/etc/ca-certificates/checkmk-ca-cert.pem",
agent_timeout=5,
checkmk_agent_timeout=5,
)

return parser.parse_args(argv)
Expand All @@ -322,7 +322,7 @@ def container_metrics_worker(
cluster_collector_base_url: Url,
headers: RequestHeaders,
verify: SslVerify,
args: argparse.Namespace, # pylint: disable=unused-argument
_args: argparse.Namespace,
) -> None: # pragma: no cover
"""
Query cadvisor api, send metrics to cluster collector
Expand Down

0 comments on commit 5dcabf7

Please sign in to comment.