Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gearmand may crash on startup in Kubernetes if service name is gearmand due to conflict with GEARMAND_PORT environment variable #320

Open
forestuser opened this issue Sep 3, 2021 · 26 comments

Comments

@forestuser
Copy link

Hello. When creating a pod + service, gearmand stops working.

LOG
INFO 2021-09-03 04:46:16.396362 [ main ] Initializing Gear on port tcp://10.101.82.102:4730 with SSL: false
INFO 2021-09-03 04:46:16.000000 [ main ] Starting up with pid 1, verbose is set to INFO
ERROR 2021-09-03 04:46:16.000000 [ main ] 0.0.0.0:tcp://10.101.82.102:4730 getaddrinfo(Unrecognized service) -> libgearman-server/gearmand.cc:626
INFO 2021-09-03 04:46:16.000000 [ main ] Shutdown complete

@esabol
Copy link
Member

esabol commented Sep 3, 2021

Please provide more information, like the gearmand version and arguments used to startup gearmand.

Could this Kubernetes issue be causing your problem?

kubernetes/kubernetes#98117 / kubernetes/kubernetes#98123

@forestuser
Copy link
Author

forestuser commented Sep 3, 2021

  containers:
  - name: gearmand
    image: artefactual/gearmand:latest
    args:
    - --listen=0.0.0.0
    - --port=4730
    ports:
    - containerPort: 4730

@forestuser
Copy link
Author

  1. If you deploy the pod, and deploy the service after germand initialization. Then germand works.

  2. If you deploy pod and service together, an error occurs and gearmand goes into shutdown.
    INFO 2021-09-03 04:46:16.396362 [ main ] Initializing Gear on port tcp://10.101.82.102:4730 with SSL: false
    INFO 2021-09-03 04:46:16.000000 [ main ] Starting up with pid 1, verbose is set to INFO
    ERROR 2021-09-03 04:46:16.000000 [ main ] 0.0.0.0:tcp://10.101.82.102:4730 getaddrinfo(Unrecognized service) -> libgearman-server/gearmand.cc:626
    INFO 2021-09-03 04:46:16.000000 [ main ] Shutdown complete

@esabol
Copy link
Member

esabol commented Sep 4, 2021

We don't maintain the artefactual/gearmand Docker image. I recommend that you take this up with whoever does. You still haven't specified the gearmand version number being used, and you didn't answer my question about the Kubernetes issues. All I can say is that gearmand version 1.1.19.1 works just fine in my Docker containers, and I don't use Kubernetes.

@esabol
Copy link
Member

esabol commented Sep 4, 2021

See also these issues:

kubernetes/kubernetes#76790
alpinelinux/docker-alpine#149

It sounds like the problem is with Alpine and/or Kubernetes to me.

@forestuser
Copy link
Author

forestuser commented Sep 6, 2021

Gearmand version 1.1.19.1. None of the kubernetes problems is suitable. I am entering into correspondence with the creators of the artefactual/gearmand image. I will inform you about the result.

I also use germand in docker-compose and there are no problems.

@esabol
Copy link
Member

esabol commented Sep 6, 2021

Gearmand version 1.1.19.1.

You're sure? Because one of your previous messages said you were using artefactual/gearmand:latest, and Docker Hub says that's 1.1.18. https://hub.docker.com/r/artefactual/gearmand/

None of the kubernetes problems is suitable.

Why do you say that? The first two give the exact same error message you are experiencing ("getaddrinfo(Unrecognized service)"), and the third describes the same problem you've described (DNS failure immediately after pod creation when using an Alpine-based Docker image).

@forestuser
Copy link
Author

You're sure? Because one of your previous messages said you were using artefactual/gearmand:latest, and Docker Hub says that's 1.1.18. https://hub.docker.com/r/artefactual/gearmand/

docker-compose exec gearmand gearmand -V
gearmand 1.1.19.1 - https://github.com/gearman/gearmand/issues

cat docker-compose.yml | grep gearmand
gearmand:
image: artefactual/gearmand

@forestuser
Copy link
Author

Why do you say that? The first two give the exact same error message you are experiencing ("getaddrinfo(Unrecognized service)"), and the third describes the same problem you've described (DNS failure immediately after pod creation when using an Alpine-based Docker image).

I'm not sure exactly, but I'll assume that the problem is in the network interaction of kubernet and gearmand, unfortunately the above examples do not fit my case.

@forestuser
Copy link
Author

Now I use the following assembly in kubernet and it works

containers:

  • name: gearmand
    image: clever/gearmand:dabf561

root@gearmand-57b766648d-l4rts:/# gearmand -V
gearmand 1.0.6 - https://bugs.launchpad.net/gearmand

Perhaps, as you wrote above, the problem is in the container. But my analysis and my level have not yet allowed me to see the malfunction.

@esabol
Copy link
Member

esabol commented Sep 9, 2021

Why do you say that? The first two give the exact same error message you are experiencing ("getaddrinfo(Unrecognized service)"), and the third describes the same problem you've described (DNS failure immediately after pod creation when using an Alpine-based Docker image).

I'm not sure exactly, but I'll assume that the problem is in the network interaction of kubernet and gearmand, unfortunately the above examples do not fit my case.

Well, they sure seem like they fit your case to me.... It seems like your problem is in the network interaction between Kubernetes and Alpine, not gearmand. The artefactual image just happens to be based on Alpine.

I don't recommend using clever/gearmand. It's very old.

Why don't you build your own Docker image? Maybe base it on Ubuntu instead of Alpine?

@forestuser
Copy link
Author

I collected 2 different images, Fedora, Ubuntu. Both work in docker, but they end with errors in kubernetes.
What could I have missed?

FROM Fedora:34
kubectl logs pod/gearmand-6d99465d86-8gxt2
INFO 2021-09-14 04:35:44.015296 [ main ] Initializing Gear on port tcp://10.109.183.242:4730 with SSL: false
INFO 2021-09-14 04:35:44.000000 [ main ] Starting up with pid 9, verbose is set to INFO
ERROR 2021-09-14 04:35:44.000000 [ main ] 0.0.0.0:tcp://10.109.183.242:4730 getaddrinfo(Servname not supported for ai_socktype) -> libgearman-server/gearmand.cc:626
INFO 2021-09-14 04:35:44.000000 [ main ] Shutdown complete

FROM ubuntu:20.4
kubectl logs pod/gearmand-6d99465d86-nsgh8
INFO 2021-09-14 06:27:18.520829 [ main ] Initializing Gear on port tcp://10.109.183.242:4730 with SSL: false
INFO 2021-09-14 06:27:18.000000 [ main ] Starting up with pid 7, verbose is set to INFO
ERROR 2021-09-14 06:27:18.000000 [ main ] 0.0.0.0:tcp://10.109.183.242:4730 getaddrinfo(Servname not supported for ai_socktype) -> libgearman-server/gearmand.cc:626
INFO 2021-09-14 06:27:18.000000 [ main ] Shutdown complete

@esabol
Copy link
Member

esabol commented Sep 14, 2021

I collected 2 different images, Fedora, Ubuntu. Both work in docker, but they end with errors in kubernetes.
What could I have missed?

I don't know. Please post your Ubuntu Dockerfile in its entirety here.

@esabol
Copy link
Member

esabol commented Sep 14, 2021

Also, if the Docker image works in Docker but not Kubernetes, have you considered that the problem might be with Kubernetes? That would seem logical to me.

@esabol
Copy link
Member

esabol commented Sep 14, 2021

Also, check out this article:

https://tech.findmypast.com/k8s-dns-lookup/

@forestuser
Copy link
Author

forestuser commented Sep 15, 2021

ARG USER_ID=101
ARG GROUP_ID=101

FROM ubuntu:20.04

RUN apt-get update -y

ARG USER_ID
ARG GROUP_ID
RUN echo "user: ${USER_ID}, grp: ${GROUP_ID}" && groupadd -g ${GROUP_ID} gearman
&& useradd -l -u ${USER_ID} -g gearman gearman
&& echo "gearman:gearman" | chpasswd
&& echo '%gearman ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers

RUN apt-get install -y gearman-job-server

COPY --chown=gearman:gearman .kubernetes/.docker/images/gearman/gearmand.conf /etc/gearmand.conf
COPY --chown=gearman:gearman .kubernetes/.docker/images/gearman/entrypoint.sh /etc/entrypoint.sh

USER gearman
ENTRYPOINT ["/etc/entrypoint.sh"]


ARG USER_ID=101
ARG GROUP_ID=101

FROM fedora:34

RUN dnf -y update && dnf clean all

ARG USER_ID
ARG GROUP_ID
RUN echo "user: ${USER_ID}, grp: ${GROUP_ID}" && groupadd -g ${GROUP_ID} gearman
&& useradd -l -u ${USER_ID} -g gearman gearman
&& echo "gearman:gearman" | chpasswd
&& echo '%gearman ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers

RUN yum install -y gearmand

COPY --chown=gearman:gearman .kubernetes/.docker/images/gearman/gearmand.conf /etc/gearmand.conf
COPY --chown=gearman:gearman .kubernetes/.docker/images/gearman/entrypoint.sh /etc/entrypoint.sh

USER gearman
ENTRYPOINT ["/etc/entrypoint.sh"]

@forestuser
Copy link
Author

Entrypoint
#!/bin/bash
gearmand

gearmand.conf
--listen=0.0.0.0
--port=4730
--log-file=stderr
--verbose=INFO
--queue-type=builtin
--threads=4
--backlog=32
--job-retries=0
--worker-wakeup=0

@forestuser
Copy link
Author

Also, check out this article:

https://tech.findmypast.com/k8s-dns-lookup/

My DNS works well, there are no errors or warnings in the logs.

@esabol
Copy link
Member

esabol commented Sep 15, 2021

My DNS works well, there are no errors or warnings in the logs.

Huh? Every single log file you've posted has indicated a DNS error. That's what those getaddrinfo errors are.

Try getting rid of the --listen=0.0.0.0 line in your gearmand.conf file. You shouldn't need that. Actually, I'd get rid of everything except --port, --verbose, and --log-file. Trust gearmand's defaults for the rest.

@Firehed
Copy link

Firehed commented Sep 29, 2021

I've experienced this issue before; the root cause is that K8S auto-injects a {SERVICENAME}_PORT={proto}://{addr}:{port} environment variable. If you've created a service named gearmand in the same namespace (likely), then the pods have something like GEARMAND_PORT=tcp://10.0.0.1:4370 set automatically as an environment variable. The gearmand application reads this, doesn't get an integer in the range of valid ports, and crashes.

Simplest workaround is to explicitly set GEARMAND_PORT=4370 (or whatever port you desire) in the pod's spec.containers.env (or, more likely, the parent statefulset), which will take precedence. Something like this:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: gearmand
spec:
  selector:
    matchLabels:
      app: gearmand
  serviceName: gearmand
  template:
    metadata:
      labels:
        app: gearmand
    spec:
      containers:
        - image: artefactual/gearmand:1.1.18-alpine
          name: gearmand
          command: ['gearmand', '--verbose=WARNING']
          env:
            - name: GEARMAND_PORT
              value: '4730'
          ports:
            - containerPort: 4730
              name: gearmand

@SpamapS
Copy link
Member

SpamapS commented Sep 29, 2021 via email

@esabol
Copy link
Member

esabol commented Oct 21, 2021

Another solution was posted over in the artefactual-labs/docker-gearmand issue by @blafasel42:

just rename your service! I called mine "gearman" instead of "gearmand". This makes k8s name the env var for the service port differently and everything works

@SpamapS: Should we consider modifying gearmand to accommodate Kubernetes by parsing the port number (and bind address??) from $GEARMAND_PORT values that look like "tcp://10.43.78.46:4730"?

@forestuser
Copy link
Author

Thank you very much everyone for the clarification.

@blafasel42
Copy link

blafasel42 commented Oct 22, 2021

@esabol i dont think it is safe to grep the port from the service's ENV-Var. Services have two ports: One they listen to (which is stated in the env var under discussion here) and one the forward requests to (which is the one gearmand needs to listen on). They can be different and then this would make the service disfunctional. I believe it would be better to ignore the GEARMAND_PORT var if the value is not numeric.

@SpamapS
Copy link
Member

SpamapS commented Oct 28, 2021

A flag that changes the name of the environment variable would be my preference as a fix, and perhaps also printing out a hint like "GERAMAND_PORT must be numeric" and a cleaner exit value.

@SpamapS SpamapS changed the title Kubernetes gearmand may crash on startup in Kubernetes if service name is gearmand due to conflict with GEARMAND_PORT environment variable Oct 28, 2021
@esabol
Copy link
Member

esabol commented Oct 28, 2021

[...] printing out a hint like "GEARMAND_PORT must be numeric" and a cleaner exit value.

That makes sense to me. I'm not a fan of your other idea though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants