Skip to content

Conversation

@tomassrnka
Copy link
Member

@tomassrnka tomassrnka commented Nov 2, 2025

  • Add dpkg-divert to permanently block gce-resolved.conf in Packer image
  • Configure Consul as recursive DNS with GCE forwarder in startup script
  • Remove routing domains approach (Domains=~consul) which doesn't work reliably
  • Restart systemd-resolved after Consul starts to prevent marking DNS as unreachable

Note: This works only for GCE deployment and does not take into account AWS / BYOC. The IP address is the same in AWS, so actually it should work, but it's just not generic enough at this moment


Note

On GCE, block gce-resolved.conf at image build and start Consul with a dynamically fetched GCE DNS recursor, reconfiguring systemd-resolved after Consul is up so Consul handles all DNS.

  • GCP Image (Packer):
    • Add dpkg-divert to block gce-resolved.conf (/etc/systemd/resolved.conf.d/gce-resolved.conf) to avoid DNS conflicts with Consul.
  • Startup Script (start-client.sh):
    • Configure systemd-resolved to use only 127.0.0.1:8600; remove routing domains and disable GCE's gce-resolved.conf.
    • Dynamically fetch GCE DNS from metadata and pass it to Consul via --recursor.
    • Start Consul first, wait for DNS port readiness, then restart systemd-resolved; add DNS readiness checks and cache flush.

Written by Cursor Bugbot for commit 9fbd28e. This will update automatically on new commits. Configure here.

  - Add dpkg-divert to permanently block gce-resolved.conf in Packer image
  - Configure Consul as recursive DNS with GCE forwarder in startup script
  - Remove routing domains approach (Domains=~consul) which doesn't work reliably
  - Restart systemd-resolved after Consul starts to prevent marking DNS as unreachable
@sitole
Copy link
Member

sitole commented Nov 3, 2025

Note: This works only for GCE deployment and does not take into account AWS / BYOC. The IP address is the same in AWS, so actually it should work, but it's just not generic enough at this moment

Provision script here is GPC specific already so we can merge it without IFs for different clouds. Later when merging configuration related on BYOC and multi-cloud support we can resolve it.

@tomassrnka tomassrnka marked this pull request as ready for review November 3, 2025 12:32
Replace sleep 3 with 10x 1s wait for consul to start
@jakubno jakubno self-assigned this Nov 3, 2025
@jakubno jakubno added the improvement Improvement for current functionality label Nov 3, 2025
# Give Consul a moment to start its DNS server on port 8600
echo "- Waiting for Consul DNS to start on port 8600..."
for i in {1..10}; do
if nc -z 127.0.0.1 8600 2>/dev/null; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nslookup returns non-zero if the lookup fails, could use that for a more meaningful check

Suggested change
if nc -z 127.0.0.1 8600 2>/dev/null; then
if ! nslookup google.com; then

echo "- Restarting systemd-resolved to apply Consul DNS config"
systemctl restart systemd-resolved
echo "- Waiting for systemd-resolved to settle"
sleep 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use nslookup here as well to wait for it to successfully resolve names

Replaced sleep with actual check on systemd-resolved restart
@tomassrnka tomassrnka merged commit 173622f into main Nov 5, 2025
27 checks passed
@tomassrnka tomassrnka deleted the 2025-11-03-gcp-consul-dns branch November 5, 2025 09:00
tomassrnka added a commit that referenced this pull request Nov 5, 2025
Changes fixed ubuntu 22.04 image to latest ubuntu 22.04 lts image family, this is follow up on #1430 dns-fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement for current functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants