Apply host DNS settings on peer state change #2291

hurricanehrndz · 2024-07-19T14:52:37Z

Describe your changes

This is a PoC and a potential fix #2002. This patch is not full fleshed out and their are some unattended side effects, but I wanted to get your thoughts and opinions @mlsmaycon and @pascal-fischer and @lixmal.

More Info:

This essential tries to ensure that host dns changes aren't applied til at least one routing peer is connected and DNS servers are reachable. It does so by tying deactivation and reactivation of upstream servers to Peerlist state changes in the status recorder.

As it stands, now status -d doesn't accurately reflect the status on first connection. That is because I haven't figure out an effective method to deactivate the servers without more intense changes. Primarily because of how the deactivate and reactivate callbacks are structure and their dependence of the removeIndex. One potential workaround is to modify deactivate to skip applying host config, so that we can fix this side effect.

Thoughts and discussion please

Issue ticket number and link

Checklist

Is it a bug fix
Is a typo/documentation fix
Is a feature enhancement
It is a refactor
Created tests that fail without the change (if possible)
Extended the README / documentation, if necessary

lixmal · 2024-08-02T08:50:09Z

This essential tries to ensure that host dns changes aren't applied til at least one routing peer is connected and DNS servers are reachable.

This approach would break non-routed dns servers (e.g. 8.8.8.8 without exit node), correct?

hurricanehrndz · 2024-08-07T14:02:49Z

This essential tries to ensure that host dns changes aren't applied til at least one routing peer is connected and DNS servers are reachable.

This approach would break non-routed dns servers (e.g. 8.8.8.8 without exit node), correct?

Not at all, it would just not apply settings until the client is connected to one peer. I am going to try and actually formulate this to a PR

I may update the activate and deactivate upstream handlers to take a flag to skip applying system settings so as fix the DNS updates from management.

hurricanehrndz · 2024-08-15T22:28:36Z

@lixmal this is ready not sure if perhaps this should be gated under a config option and the default would be maintain the current behavior. Test and let me know what you think

hurricanehrndz · 2024-08-16T00:14:52Z

Actually I got an idea to make a hybrid model, let me get that going instead

LeszekBlazewski · 2024-08-16T09:06:45Z

Looking forward for this fix.

If I could help in any way to test this (for example on a setup I described in #2002 (comment)) let me know.

hurricanehrndz · 2024-08-16T16:11:39Z

Okay so this will waitforresponse and trigger probes base when a peer connstatus changes

hurricanehrndz · 2024-08-16T16:12:49Z

@lixmal and @mlsmaycon can we get a build so @LeszekBlazewski can provide some feedback

lixmal · 2024-08-16T16:20:27Z

You can find the binaries in the artifacts in this PR: https://github.com/netbirdio/netbird/actions/runs/10422794281

LeszekBlazewski · 2024-08-19T21:58:11Z

Hey guys,

I have run some tests with the latest binaries built in this PR.

My setup:

private R53 hosted zone with a configured inbound resolver (3 ips in range 10.3.X.X, only configured to match specific domains). Those 3 IP addresses are only accessible within VPC which runs a netbird peer. For the whole 10.3.0.0/16 range I have a network route configured which points all that traffic to the peer which runs inside that VPC and therefore is allowed to use the resolver.
Imaginary DNS nameserver (ip 1.2.3.4, match all domains) that won't ever resolve
Just for testing sake, the cloudflare and google DNS resolver (match all domains)
Exit node setup

After comparing the current behaviour (netbird 0.28.7) vs the built binaries, I have found out that indeed the introduced changes postpone DNS servers evaluation until a routing peer is connected which does potentially address the issue mentioned in #2002 ... unless the nameserver is not supposed to be handled by that connected peer. Based on the logs, I have observed that nameserver probing is triggered regardless of the peer that got just connected ( I guess it's assumed that all configured netbird DNS servers can be handled by all connected peers or there are other technical limitations which prevent figuring out such information, as described below).

For example in my described setup, it would make sense to probe the nameservers from point 1 only if the routing peer which handles 10.3.X.X/16 would be connected. I am aware that such condition might be challenging to meet since there is no peer assignment for DNS resolution handling (only for distribution). Also the public DNS resolution (Google and Cloudflare) make the decision wether to probe / don't probe DNS even more harder since no one configures network routes for those DNS servers (there is a reason why they are public).

But still, I think that not applying the config when the DNS server is not reachable is key here and that seems to work. Moreover the backoff over time will fix the above mentioned issue and all the DNS servers who are reachable will be connected.

hurricanehrndz · 2024-08-19T22:06:31Z

Hey guys,

I have run some tests with the latest binaries built in this PR.

My setup:
1. private R53 hosted zone with a configured inbound resolver (3 ips in range `10.3.X.X`, only configured to match specific domains). Those 3 IP addresses are only accessible within VPC which runs a netbird peer. For the whole 10.3.0.0/16 range I have a network route configured which points all that traffic to the peer which runs inside that VPC and therefore is allowed to use the resolver.

2. Imaginary DNS nameserver (ip 1.2.3.4, match all domains) that won't ever resolve

3. Just for testing sake, the cloudflare and google DNS resolver (match all domains)

4. Exit node setup
After comparing the current behaviour (netbird 0.28.7) vs the built binaries, I have found out that indeed the introduced changes postpone DNS servers evaluation until a routing peer is connected which does potentially address the issue mentioned in #2002 ... unless the nameserver is not supposed to be handled by that connected peer. Based on the logs, I have observed that nameserver probing is triggered regardless of the peer that got just connected ( I guess it's assumed that all configured netbird DNS servers can be handled by all connected peers or there are other technical limitations which prevent figuring out such information, as described below).

For example in my described setup, it would make sense to probe the nameservers from point 1 only if the routing peer which handles 10.3.X.X/16 would be connected. I am aware that such condition might be challenging to meet since there is no peer assignment for DNS resolution handling (only for distribution). Also the public DNS resolution (Google and Cloudflare) make the decision wether to probe / don't probe DNS even more harder since no one configures network routes for those DNS servers (there is a reason why they are public).

But still, I think that not applying the config when the DNS server is not reachable is key here and that seems to work. Moreover the backoff over time will fix the above mentioned issue and all the DNS servers who are reachable will be connected.

I made this patch to show what is one potential fix, probing base on routes when a peer connects will get more complicated, but it is possible.

This patch still includes the old logic as fallback, so triggering if public DNS servers are configured they should get apply very early on

hurricanehrndz · 2024-08-20T00:40:09Z

Hey guys,

I have run some tests with the latest binaries built in this PR.

My setup:
1. private R53 hosted zone with a configured inbound resolver (3 ips in range `10.3.X.X`, only configured to match specific domains). Those 3 IP addresses are only accessible within VPC which runs a netbird peer. For the whole 10.3.0.0/16 range I have a network route configured which points all that traffic to the peer which runs inside that VPC and therefore is allowed to use the resolver.

2. Imaginary DNS nameserver (ip 1.2.3.4, match all domains) that won't ever resolve

3. Just for testing sake, the cloudflare and google DNS resolver (match all domains)

4. Exit node setup
After comparing the current behaviour (netbird 0.28.7) vs the built binaries, I have found out that indeed the introduced changes postpone DNS servers evaluation until a routing peer is connected which does potentially address the issue mentioned in #2002 ... unless the nameserver is not supposed to be handled by that connected peer. Based on the logs, I have observed that nameserver probing is triggered regardless of the peer that got just connected ( I guess it's assumed that all configured netbird DNS servers can be handled by all connected peers or there are other technical limitations which prevent figuring out such information, as described below).

For example in my described setup, it would make sense to probe the nameservers from point 1 only if the routing peer which handles 10.3.X.X/16 would be connected. I am aware that such condition might be challenging to meet since there is no peer assignment for DNS resolution handling (only for distribution). Also the public DNS resolution (Google and Cloudflare) make the decision wether to probe / don't probe DNS even more harder since no one configures network routes for those DNS servers (there is a reason why they are public).

But still, I think that not applying the config when the DNS server is not reachable is key here and that seems to work. Moreover the backoff over time will fix the above mentioned issue and all the DNS servers who are reachable will be connected.

I will try and add the ability to trigger a peer with route. What I gather though was that the App was performing and behaving better apply DNS changes base on when peers connect, is that right?

LeszekBlazewski · 2024-08-20T05:02:29Z

Yeah, for sure. With the patch from this PR the amount of timed out calls to private NS is minimised.

One thing I wanted to figure out and I am wondering if this could be related to my setup somehow is: Based on the logs, even after the peer which handles 10.3.X.X traffic connects (logs indicate that), the first few calls to probe the private NS still fail and only after few seconds they come back online. I will try to investigate this since I am confident that those NS are available at all times and it could be related to the exit node node or the routing peer connecting as relay.

hurricanehrndz · 2024-08-20T13:15:34Z

Yeah, for sure. With the patch from this PR the amount of timed out calls to private NS is minimised.

One thing I wanted to figure out and I am wondering if this could be related to my setup somehow is: Based on the logs, even after the peer which handles 10.3.X.X traffic connects (logs indicate that), the first few calls to probe the private NS still fail and only after few seconds they come back online. I will try to investigate this since I am confident that those NS are available at all times and it could be related to the exit node node or the routing peer connecting as relay.

Do you have any system extensions that filter DNS requests, such as Cisco Umbrella? Assuming this is a macOS device

LeszekBlazewski · 2024-08-21T06:35:50Z

The only thing running in the background is eset with the Network Access Protection enabled which might do some DNS filtering (I am not sure since I don't have access to the management of this antivirus but I was told it does not). I will make sure to disable it when testing.

What is interesting is that the NS requests fail only initially and then just work without me doing any changes.

hurricanehrndz · 2024-08-22T18:02:02Z

The only thing running in the background is eset with the Network Access Protection enabled which might do some DNS filtering (I am not sure since I don't have access to the management of this antivirus but I was told it does not). I will make sure to disable it when testing.

What is interesting is that the NS requests fail only initially and then just work without me doing any changes.

This patch now support enabling publicly accessible servers at processing the DNS update time and/or start

hurricanehrndz · 2024-08-22T18:20:55Z

@LeszekBlazewski

Minor updates to behavior in order to ensure publicly accessible servers work from the start without a peer. I unfortunately didn't filter probing base on network. So test this one out. I will see what I can do about that front, but this would need to be more heavily refactored since I would need to start another network monitor

LeszekBlazewski · 2024-08-23T05:29:51Z

Indeed, did a quick test and the public nameservers were enabled almost instantly whereas the rest got processed later on. I am doing some checks on why the first initial few nameserver requests after the connection to the peer which is supposed to handle that traffic still fail.

This patch adds an additional DNS probe trigger by ConnStatus changes to any peer. This is an attempt to limit the number of DNS changes applied to Darwin, Windows and BSD hosts. There should be no change whatsoever on how iOS and Android operate. How publicly accessible DNS servers get applied to the host should also remain unchanged.

Verison 29+ introduce new functions for the built-in relay that notify when peer state has changed. Similarly for this feature to work, there are handlers that need to close the aPeerConnStatusChanged channel to signal a change.

hurricanehrndz · 2024-09-23T15:15:47Z

@LeszekBlazewski can you test the latest build, the patch has been updated to work with the latest 29.x releases. I believe @mlsmaycon will be reviewing this soon

https://github.com/netbirdio/netbird/actions/runs/10997072510?pr=2291

LeszekBlazewski · 2024-09-23T18:35:09Z

Hi, sorry but I won't be able to test this until the 7th of October.

…

On Mon, 23 Sept 2024, 17:16 Carlos Hernandez, ***@***.***> wrote: @LeszekBlazewski <https://github.com/LeszekBlazewski> can you test the latest build, the patch has been updated to work with the latest 29.x releases. I believe @mlsmaycon <https://github.com/mlsmaycon> will be reviewing this soon https://github.com/netbirdio/netbird/actions/runs/10997072510?pr=2291 — Reply to this email directly, view it on GitHub <#2291 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIKPEJU26ZGWPW6TVGSJ7I3ZYAWDXAVCNFSM6AAAAABLEXUR26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRYGYYDOMRYGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

sonarqubecloud · 2024-10-01T16:19:56Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
4.2% Duplication on New Code

See analysis details on SonarCloud

…te_change * upstream/main: (81 commits) Fix cached device flow oauth (netbirdio#2833) Avoid failing all other matrix tests if one fails (netbirdio#2839) add all group to add peer affected peers network map check (netbirdio#2830) [client] Log windows panics (netbirdio#2829) Fix unused servers cleanup (netbirdio#2826) [management] Add DB access duration to logs for context cancel (netbirdio#2781) Allocate new buffer for every package (netbirdio#2823) [client] Nil check on ICE remote conn (netbirdio#2806) [management] remove network map diff calculations (netbirdio#2820) Create FUNDING.yml (netbirdio#2814) Create funding.json (netbirdio#2813) [management] add metrics to network map diff (netbirdio#2811) [client] Fix the broken dependency gvisor.dev/gvisor (netbirdio#2789) fix meta is equal slices (netbirdio#2807) [client] Fix multiple peer name filtering in netbird status command (netbirdio#2798) [management] Setup key improvements (netbirdio#2775) [client] allow relay leader on iOS (netbirdio#2795) [client] Remove legacy forwarding rules in userspace mode (netbirdio#2782) [client] Ignore route rules with no sources instead of erroring out (netbirdio#2786) [misc] Update Zitadel from v2.54.10 to v2.64.1 ...

hurricanehrndz · 2024-11-15T16:08:05Z

@LeszekBlazewski this now applies settings base on routing table

Eu-Arthur · 2024-11-28T15:02:03Z

i have same problem, like #2002, and this PR fix my problem.

But, it's possible to be rebase to last version ?

hurricanehrndz · 2024-11-29T13:42:11Z

Yeah, I will rebase it and see if I can fix some tests. Get Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: Eustache Arthur ***@***.***> Sent: Thursday, November 28, 2024 8:02:26 AM To: netbirdio/netbird ***@***.***> Cc: Carlos Hernandez ***@***.***>; Mention ***@***.***> Subject: Re: [netbirdio/netbird] Apply host DNS settings on peer state change (PR #2291) i have same problem, like #2002<#2002>, and this PR fix my problem. But, it's possible to be rebase to last version ? — Reply to this email directly, view it on GitHub<#2291 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABMJBTKCQ4TGDXQHY3L55TT2C4WAFAVCNFSM6AAAAABLEXUR26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBWGMYTQMRYG4>. You are receiving this because you were mentioned.Message ID: ***@***.***>

…te_change * upstream/main: (55 commits) [client] Account different policiy rules for routes firewall rules (netbirdio#2939) Add guide when signing key is not found (netbirdio#2942) [tests] Enable benchmark tests on github actions (netbirdio#2961) [management] Add performance test for login and sync calls (netbirdio#2960) [management] refactor to use account object instead of separate db calls for peer update (netbirdio#2957) [client] Code cleaning in net pkg and fix exit node feature on Android(netbirdio#2932) [management] Refactor nameserver groups to use store methods (netbirdio#2888) [management] Refactor DNS settings to use store methods (netbirdio#2883) [management] Refactor policy to use store methods (netbirdio#2878) [management] Refactor posture check to use store methods (netbirdio#2874) [client] Allow routing to fallback to exclusion routes if rules are not supported (netbirdio#2909) [client] Set up sysctl and routing table name only if routing rules are available (netbirdio#2933) [client] Test nftables for incompatible iptables rules (netbirdio#2948) [client] Don't return error in userspace mode without firewall (netbirdio#2924) Import time package (netbirdio#2940) [misc] Renew slack link (netbirdio#2938) [relay] Refactor initial Relay connection (netbirdio#2800) [management] Fix getSetupKey call (netbirdio#2927) [client] Fix allow netbird rule verdict (netbirdio#2925) [management] Add activity events to group propagation flow (netbirdio#2916) ...

sonarqubecloud · 2024-11-29T21:51:18Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

hurricanehrndz · 2024-11-29T21:52:18Z

i have same problem, like #2002, and this PR fix my problem.

But, it's possible to be rebase to last version ?

I have updated this PR, but it is a WIP, please test. I need update this PR to support other platforms

Eu-Arthur · 2024-11-30T10:20:54Z

Yes, i don't see problem, for me that work on latest version

hurricanehrndz force-pushed the poc_DNS_on_peer_state_change branch 2 times, most recently from ec6228a to 21f265a Compare July 19, 2024 14:58

mlsmaycon self-assigned this Aug 1, 2024

mlsmaycon requested a review from lixmal August 1, 2024 15:11

hurricanehrndz force-pushed the poc_DNS_on_peer_state_change branch from 21f265a to 09778a6 Compare August 15, 2024 21:38

hurricanehrndz force-pushed the poc_DNS_on_peer_state_change branch from 09778a6 to 7b40359 Compare August 16, 2024 16:03

hurricanehrndz force-pushed the poc_DNS_on_peer_state_change branch from 508030c to 365f6b7 Compare August 19, 2024 14:11

hurricanehrndz force-pushed the poc_DNS_on_peer_state_change branch from 365f6b7 to 2196243 Compare August 21, 2024 19:48

hurricanehrndz force-pushed the poc_DNS_on_peer_state_change branch from aa5c94f to f7c8dc2 Compare August 22, 2024 18:10

hurricanehrndz changed the title ~~Poc dns on peer state change~~ Apply host DNS settings on peer state change Aug 22, 2024

hurricanehrndz closed this Aug 22, 2024

hurricanehrndz reopened this Aug 22, 2024

hurricanehrndz force-pushed the poc_DNS_on_peer_state_change branch from f7c8dc2 to 443ce47 Compare August 29, 2024 13:46

hurricanehrndz force-pushed the poc_DNS_on_peer_state_change branch from 583f0c0 to 30e9cb7 Compare September 3, 2024 14:34

hurricanehrndz added 7 commits September 23, 2024 08:46

Fix lint issues

65fbdb2

Disable upstream if ns private and 0 peers

a632acc

Better comment for trigger DNS probe on connstatus

1157e13

Set handler to disable when nsgroup disabled

81aed5d

Drop atomic bool when probing on peer change

25e47ff

Update when aPeerConnStatusChanged is closed

814369b

Verison 29+ introduce new functions for the built-in relay that notify when peer state has changed. Similarly for this feature to work, there are handlers that need to close the aPeerConnStatusChanged channel to signal a change.

hurricanehrndz force-pushed the poc_DNS_on_peer_state_change branch from 30e9cb7 to 814369b Compare September 23, 2024 14:58

Reduce unnecessary complexity

8db48a1

hurricanehrndz added 4 commits October 1, 2024 09:05

Refactor make more readable and thread safe

571885c

Log if nsgroup is enabled

5dc65bc

Fix upstream DNS tests

dc0cfc4

Fix spelling mistake

f92341a

hurricanehrndz added 4 commits November 5, 2024 08:30

Unix: dns probe via route check alone

ad3e96c

Probe via route check on darwin only

dab2ff6

fix: linting issue

614c542

hurricanehrndz added 3 commits November 29, 2024 14:17

Cleanup from merge conflicts

637ebfb

VPNs usually do not react to upstream failures

5e129d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply host DNS settings on peer state change #2291

Apply host DNS settings on peer state change #2291

hurricanehrndz commented Jul 19, 2024 •

edited

Loading

lixmal commented Aug 2, 2024

hurricanehrndz commented Aug 7, 2024 •

edited

Loading

hurricanehrndz commented Aug 15, 2024

hurricanehrndz commented Aug 16, 2024 •

edited

Loading

LeszekBlazewski commented Aug 16, 2024

hurricanehrndz commented Aug 16, 2024

hurricanehrndz commented Aug 16, 2024

lixmal commented Aug 16, 2024

LeszekBlazewski commented Aug 19, 2024

hurricanehrndz commented Aug 19, 2024

hurricanehrndz commented Aug 20, 2024

LeszekBlazewski commented Aug 20, 2024

hurricanehrndz commented Aug 20, 2024

LeszekBlazewski commented Aug 21, 2024 •

edited

Loading

hurricanehrndz commented Aug 22, 2024

hurricanehrndz commented Aug 22, 2024 •

edited

Loading

LeszekBlazewski commented Aug 23, 2024

hurricanehrndz commented Sep 23, 2024

LeszekBlazewski commented Sep 23, 2024 via email

sonarqubecloud bot commented Oct 1, 2024

hurricanehrndz commented Nov 15, 2024

Eu-Arthur commented Nov 28, 2024

hurricanehrndz commented Nov 29, 2024 via email

sonarqubecloud bot commented Nov 29, 2024

hurricanehrndz commented Nov 29, 2024

Eu-Arthur commented Nov 30, 2024

Apply host DNS settings on peer state change #2291

Are you sure you want to change the base?

Apply host DNS settings on peer state change #2291

Conversation

hurricanehrndz commented Jul 19, 2024 • edited Loading

Describe your changes

Issue ticket number and link

Checklist

lixmal commented Aug 2, 2024

hurricanehrndz commented Aug 7, 2024 • edited Loading

hurricanehrndz commented Aug 15, 2024

hurricanehrndz commented Aug 16, 2024 • edited Loading

LeszekBlazewski commented Aug 16, 2024

hurricanehrndz commented Aug 16, 2024

hurricanehrndz commented Aug 16, 2024

lixmal commented Aug 16, 2024

LeszekBlazewski commented Aug 19, 2024

hurricanehrndz commented Aug 19, 2024

hurricanehrndz commented Aug 20, 2024

LeszekBlazewski commented Aug 20, 2024

hurricanehrndz commented Aug 20, 2024

LeszekBlazewski commented Aug 21, 2024 • edited Loading

hurricanehrndz commented Aug 22, 2024

hurricanehrndz commented Aug 22, 2024 • edited Loading

LeszekBlazewski commented Aug 23, 2024

hurricanehrndz commented Sep 23, 2024

LeszekBlazewski commented Sep 23, 2024 via email

sonarqubecloud bot commented Oct 1, 2024

Quality Gate passed

hurricanehrndz commented Nov 15, 2024

Eu-Arthur commented Nov 28, 2024

hurricanehrndz commented Nov 29, 2024 via email

sonarqubecloud bot commented Nov 29, 2024

Quality Gate passed

hurricanehrndz commented Nov 29, 2024

Eu-Arthur commented Nov 30, 2024

hurricanehrndz commented Jul 19, 2024 •

edited

Loading

hurricanehrndz commented Aug 7, 2024 •

edited

Loading

hurricanehrndz commented Aug 16, 2024 •

edited

Loading

LeszekBlazewski commented Aug 21, 2024 •

edited

Loading

hurricanehrndz commented Aug 22, 2024 •

edited

Loading