-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nslookup fails in Alpine 3.11.3 #539
Comments
This is most likely related the Does it work if you use a trailing It seems that nslookup will append the search domain if there are no dots in the hostname. |
Alas, adding a dot doesn't help (the following is running on Kubernetes, not plain Docker, therefore the DNS IP is different,but the result is the same).
|
I am wondering if this is now an acknowledged problem that will eventually be fixed or not.
on Kubernetes it doesn't:
|
I am interested in fixing this, or at least report it upstream to busybox bugtracker, but I am not sure what the expected response is. Apparently the kubernetes dns server gives different response? Is it same dns server? are ther any other configs in /etc/resolv.conf? It would be nice if we had a simple way to reporduce it, using public available internet servers. What I know for sure is that "zookeeper" is not a valid hostname on internet. Nor is it a toplevel domain so |
It would also be helpful if you could report it upstream to https://bugs.busybox.net/ |
a tcpdump of the network activity would also be helpful. |
Hi This is the content of resolv.conf in a Kubernetes environment:
Compared this to a plain Docker environment (boot2docker/Docker Toolbox)
On Alpine 3.11.2, when running nslookup without dot, it works (in particular, exit code is 0):
Compared to Alpine 3.11.3:
Observe that while in the Alpine 3.11.3 case the command apparently finds the proper IP address at some point and writes it into its output, its exit code is 1 instead of 0, and that breaks our start script. Now with an attached dot, it fails in both 3.11.2 and 3.11.3, with slightly different output: Alpine 3.11.2
Alpine 3.11.3
However, this is to be expected, as - AFAIK - adding a dot makes this a full-qualified name, so no lookup relative to the local search domains is performed, and so it has to fail. This is the result of tcpdump when running nslookup on 3.11.3 (without trailing dot)
And this is the for 3.11.2 (again, without the trailing dot):
|
The problem seems to be the additional search domains - if one of them fails, the command is considered failed. If I remove the extra domains from resolv.conf and only leave
it works:
|
That is what I suspected. Thank you for conforming that. We should have enough info to be able to fix this thing. Next step will be to report it to busybox developers. https://bugs.busybox.net/ I am sorry that I have not had time to prioritize this, but I believe we will be able to have a fix for this for 3.11.4. Thanks! |
Just to clarify - should we report this to busybox devs (mainly by pointing them to this issue), or will you? |
I was hoping you could help me with that, while I work on a fix ;) Thanks! |
I have reported it upstream: https://bugs.busybox.net/show_bug.cgi?id=12541 |
upstream report: https://bugs.busybox.net/show_bug.cgi?id=12541 downstream report: gliderlabs/docker-alpine#539
I have pushed a fix to alpine edge. Can you please test if it solves your issue? Use |
I tested it, on Kubernetes it works as expected:
The IP get's resolved against one of the domains found in /etc/resolv.conf and the exit code is 0. Alas, it still doesn't work in plain Docker environments as it did before, unless I append a dot:
As you can see, "Ping" can resolve trhe name just fine, nslookup without appended dot fails. Also see content of resolv.conf |
I can confirm it still doesn't work in
How I reproduced:
Works in
|
Works with latest alpine:edge for me:
|
@jgoeres can you please test with latest |
@ncopa I tested with
|
This is because in newer versions, nslookup has a different behaviour: if the DNS lookup uses the "search" suffixes in /etc/resolv.conf, and if any of them does not succeed, the command as a whole returns a non-zero exit code. The exit code is only 0 if all the queries (for all the "search" suffixes) succeed, which is usually not the case. nslookup in Alpine 3.11.2 returns 0 if any of the queries succeeds, that is, if the name can actually be resolved, and a non-zero exit code only if no query at all sucdeeds. This is the desired behaviour. See gliderlabs/docker-alpine#539
We're now on Alpine 3.12.0 if you grab alpine:latest. It looks like nslookup is working fine. So this "ticket" should be closed. |
This is on |
Working for me totally fine with $ docker run --rm --name alpine -it --network net alpine nslookup web
Server: 127.0.0.11
Address: 127.0.0.11:53
Non-authoritative answer:
Non-authoritative answer:
Name: web
Address: 172.20.0.2
|
Fails for me. Interesting that alpine 3.11.2 and 3.11.3 both say busybox is the same version (1.31.1). And, they are the same file size. However, the 3.11.2 one has a date of Dec 18, 2019, while the 3.11.3 one has a date of Jan 15, 2020. And, the sha256 hash is different. The only dynamic library is libc.musl-x86_64.so.1, and they have the same date and hash. Copying the busybox from 3.11.2 to 3.11.3 makes it work. The APK for busybox is 1.31.1-r8 on 3.11.2 and 1.31.1-r9 on 3.11.3. What's the diff between r8 and r9?:
I'm not an expert on busybox, but it looks like these are compile-time options, so we can't "fix" this by a configuration change. The best option may be for the maintainer of the Alpine Linux package to revert this change. Basically what the change does is turn on the internal busybox resolver, rather than using the standard library. If there is an issue with the busybox resolver code, then of course that should be fixed. However, it was a change in the Alpine Linux package that turned this feature on and "broke" it. Can we get it turned back off? An ltrace of the r8 and r9 versions clearly shows the r8 calling the standard library resolver, where r9 does not. It also shows the r9 version (with the internal busybox resolver) string comparing for domain, search, and nameserver keywords in the resolv.conf, but not options. Look at the busybox source file for nslookup.c. It has no ability to parse options, and hence ndots. Please revert this. Oh, and it is also broken in alpine:latest, which uses busybox-1.31.1-r19, ltrace shows the same behavior. |
I am experiencing this issue since alpine:3.13 as well, albeit under different circumstances as mentioned here, but likely related nonetheless. I'm using Docker on my development machine through Dinghy. It seems the resolving mechanism doesn't play nice with its DNS server (which is used to resolve *.docker addresses to its own IP address and forward all other queries to the host's resolver). Running an
The biggest issue is that commands suck as docker run --rm -ti alpine:3.11 sh -c 'apk --no-cache add curl && curl -I https://www.google.com/' Up to 3.12, the output is as follows:
The 3.13 image, however, produces the following output:
The errors are very specific to my installation, I've tried these commands on an Ubuntu based Docker installation without any issues. Perhaps the cause of the issues @gaby is having is related. Could it be the resolving mechanism has issues with certain resolvers? Could it be an IPv6 issue? There's an issue over on the Dinghy repository that might be related. |
Also, removing (or setting the value to "1") of |
I also ran into this issue on Alpine 3.14.2 - |
Seeing this issue on 3.16.0 while on a vpn resolv.conf looks like this
When using nslookup everything works fine. However when curling an internal company domain I get one successful call, then the rest are failures. Unless I wait a minute or so and try again. It's very strange. |
Any update? Is this problem so difficult to solve? Alpine's image is small and light, our team likes it, but this problem confuses us. |
I've met the similar issue. $ cat /etc/resolv.conf
# DNS requests are forwarded to the host. DHCP DNS options are ignored.
nameserver 192.168.65.7 alpine:3.13.0 and later do not work to me. $ ping -c 4 $SSH_HOST
ping: bad address '<hidden>.local' $ nslookup $SSH_HOST
Server: 192.168.65.7
Address: 192.168.65.7:53
Non-authoritative answer:
Name: <hidden>.local
Address: 172.16.1.1[39](https://<masked>/-/jobs/178#L39)
*** Can't find <hidden>: No answer $ ssh -v $SSH_USER@$SSH_HOST "echo '!!!done!!!'"
OpenSSH_9.1p1, OpenSSL 3.0.8 7 Feb 2023
debug1: Reading configuration data /etc/ssh/ssh_config
ssh: Could not resolve hostname <hidden>.local: Try again |
When i add bind tools, all works
nslookup works fine, it's a workaround ? its busybox bug only ? |
@TheDevilDan This was fixed in recent alpine releases. Forgot which version |
I use 8.1-fpm-alpine the latest, and the domains are not full when i request : exit 1
|
That bug was fixed in 3.18, and your image is based on that according to Docker Hub |
Very strange, I have the problem in all pods alpine with busybox inside, I test with Traefik V3.0 RC5, same problem, I have to install bind-tools and it works perfectly after that
|
We just switched to Alpine 3.11.3 and now nslookup is failing for us unless we explicitly specify the DNS server IP (which is of course not an option), e.g.
versus
Ping etc. work flawlessly.
Alas, we are using nslookup in some of our startscripts to defer starting of the actual application inside the container until another container shows up in DNS (cause the 3rd party tool we are using considers a failed DNS lookup a non-recoverable error...).
Could this be related to enabling the nslookup feature "FEATURE_NSLOOKUP_BIG" as mentioned here:
#476
The text was updated successfully, but these errors were encountered: