Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Relay public thread - Q&A and Issues discussions #2566

Open
mlsmaycon opened this issue Sep 9, 2024 · 66 comments
Open

New Relay public thread - Q&A and Issues discussions #2566

mlsmaycon opened this issue Sep 9, 2024 · 66 comments

Comments

@mlsmaycon
Copy link
Collaborator

Hello folks, this issue is open to any questions or problems regarding the new relay implementation.

@mlsmaycon
Copy link
Collaborator Author

Status information to confirm relay usage:

Peers detail:
 relay-test-ip-172-20-1-178-rly.netbird.selfhosted:
  NetBird IP: 100.89.101.6
  Public key: CdRpcUnzq2LM9v97VnU7JiiqE0Y4wXp379mXju0efjk=
  Status: Connected
  -- detail --
  Connection type: Relayed
  ICE candidate (Local/Remote): -/-
  ICE candidate endpoints (Local/Remote): -/-
  Relay server address: rels://relay-eu1.stage.netbird.io <--------------- indicates the relay used to connect to the remote peer
  Last connection update: 2 seconds ago
  Last WireGuard handshake: 3 seconds ago
  Transfer status (received/sent) 92 B/180 B
  Quantum resistance: false
  Routes: -
  Latency: 0s

 relay-test-ip-172-20-14-148.netbird.selfhosted:
  NetBird IP: 100.89.212.227
  Public key: bhSrOMLvN+5cMnjWyL4gB+o9En2a1AvAGWNB5N+gEGw=
  Status: Connected
  -- detail --
  Connection type: P2P
  ICE candidate (Local/Remote): host/srflx
  ICE candidate endpoints (Local/Remote): 192.168.178.38:51820/1.2.3.4:51820
  Relay server address: rels://relay-eu2.stage.netbird.io. <--------------- indicates the relay used to connect to the remote peer ( there is a bug which this needs to be cleaned after P2P connection)
  Last connection update: 2 seconds ago
  Last WireGuard handshake: 3 seconds ago
  Transfer status (received/sent) 92 B/180 B
  Quantum resistance: false
  Routes: 34.160.111.145/32
  Latency: 28.5755ms

OS: darwin/arm64
Daemon version: 0.29.0
CLI version: 0.29.0
Management: Connected to [https://test.stage.netbird.io:443](https://test.stage.netbird.io/)
Signal: Connected to [https://signal.stage.netbird.io:443](https://signal.stage.netbird.io/)
Relays:
  [stun:test.stage.netbird.io:3478] is Available
  [turn:test.stage.netbird.io:3478?transport=udp] is Available
  [rels://relay-eu1.stage.netbird.io] is Available.    <--------------- indicates the relay used by your local client (the home relay)
Nameservers:
  [8.8.8.8:53, 8.8.4.4:53] for [.] is Available
FQDN: maycons-macbook-pro-2-1.netbird.selfhosted
NetBird IP: 100.89.107.107/16
Interface type: Userspace
Quantum resistance: false
Routes: -
Peers count: 2/2 Connected

@allroundtechie
Copy link

allroundtechie commented Sep 9, 2024

Hi,

I have some questions about the new relay which are not clear to me.

  1. In the release notes you wrote "We are moving away from the TURN relay (coturn) to our own relay implementation based on WebSocket".
    If I take that literally this means that "only" the TURN part of coturn gets replaced but not the STUN part. Is this correct and the release only the first step to replace coturn completely or is the STUN part also already replaced with the new relay?
  2. In the example mentioned above which indicates the relay is used it is I guess active in a secured version but in the release notes only this part is mentioned: "Addresses": ["rel://:"]
    Can you enable TLS in the new relay and if yes how? Or is this something for a future release?
  3. I am using Traefik as a reverse proxy and also have implemented Netbird like described in the documentation which works well. I am missing documentation around the new relay and a reverse proxy.

Thanks in advance and also many thanks for your awesome work in building this great software stack!

@mlsmaycon
Copy link
Collaborator Author

@landmass-deftly-reptile-budget:

  1. Stun is still going to be required for the P2P discovery. Also, for retro-compatibility, TURN is still required.
  2. The supported URLs are rel:// and rels://, where rels is used for TLS connections. Like signal and management, the relay have Let's Encrypt support, and you can use the environment variables below to enable it:
NB_EXPOSED_ADDRESS=rels://relay.example.com:443  # update the port configuration to match it
NB_LETSENCRYPT_DOMAINS=relay.example.com # should match the exposed address
NB_LETSENCRYPT_DATA_DIR=/etc/letsencrypt # mount this directory for persistency
[email protected]
#NB_LETSENCRYPT_AWS_ROUTE53=true # in case you want to use route 53 for issuing the certificate

It also supports certificate files with:

NB_TLS_CERT_FILE=/etc/certificates/cert.crt
NB_TLS_KEy_FILE=/etc/certificates/cert.key

Once this is done, add the exposed address to the management.json file and restart the file.

  1. Relay should work fine behind traefik. We are missing the configuration, but the traffic to the service can be routed with either a domain or with the /relay path prefix.

@ismail0234
Copy link

Hello, I have 2 questions. I am undecided whether to upgrade or not.

  1. I don't fully understand the new Relay Feature. How will it benefit us?
  2. What was the reason to switch to our own relay application? Was there something that the existing system did not meet?

@bryanjuho
Copy link

Is it okay to update to 0.29.0 without actually running the new relay image and changing management.json?

@rudradevpal
Copy link

For new relay to work is there any new openwrt package released?

@Marcus1Pierce
Copy link

Is it okay to use same domain for management, signal, coturn and relay?

Example: If i use domain netbird.domain.com and i want to use this domain for all services but with different port is that okay?

@Zaunei
Copy link

Zaunei commented Sep 10, 2024

  1. Also, for retro-compatibility, TURN is still required.

If I don't care about old clients, I can ignore TURN completely, right?

Otherwise, this sounds very promising, especially with Kubernetes, the port ranges of TURN have always made the setup a bit more complex. I will definitely give it a try and report back.

STUN will continue to be used in the future?

@WolfgangDpunkt
Copy link

Replace PORT and DOMAIN according to your deployment.

I have used the automatic setup script, so I am probably using the default values for ports, so what do I need to specify here for PORT in the compose file?

@ndziuba
Copy link

ndziuba commented Sep 10, 2024

Replace PORT and DOMAIN according to your deployment.

I have used the automatic setup script, so I am probably using the default values for ports, so what do I need to specify here for PORT in the compose file?

It can be found in the setup.env file.
The default port is 33080

@MDMeridio001
Copy link

Hello,

Is it possible to run the relay behind nginx acting as a proxy? I have tried by adding the following to my nginx configuration file, but it results in clients recieving a 400 error when trying to establish a connection to the relay. A direct connection without nginx in front works perfectly fine.

upstream relay-upstream {
    server 127.0.0.1:33080;
}

[...]

# Proxy Relay http endpoint
    location /relay/ {
        proxy_pass http://relay-upstream/relay;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $host;
    }

@mvivaldi
Copy link

Hello,

Is it possible to run the relay behind nginx acting as a proxy? I have tried by adding the following to my nginx configuration file, but it results in clients recieving a 400 error when trying to establish a connection to the relay. A direct connection without nginx in front works perfectly fine.

upstream relay-upstream {
    server 127.0.0.1:33080;
}

[...]

# Proxy Relay http endpoint
    location /relay/ {
        proxy_pass http://relay-upstream/relay;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $host;
    }

try add these:

      proxy_set_header Host            $http_host;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header X-Forwarded-For $remote_addr;
      proxy_set_header X-Forwarded-Host $http_host;
      proxy_cache_bypass $http_upgrade;

and delete the directive:

proxy_set_header Host $host;

@mlsmaycon
Copy link
Collaborator Author

mlsmaycon commented Sep 10, 2024

Hello, I have 2 questions. I am undecided whether to upgrade or not.

  1. I don't fully understand the new Relay Feature. How will it benefit us?
  2. What was the reason to switch to our own relay application? Was there something that the existing system did not meet?

@ismail0234 some of the benefits of the new relay over Coturn:

  • More efficient relay connection for multiple peers: The ICE mode with TURN opens a connection with the TURN server for each peer connection. That consumes more resources on the client and on the coturn server.
  • The NetBird connection with the new relay is up to 15% faster than coturn.
  • The service is easier to run on self-hosted environments, since need to configure a single port.
  • Built-in TLS/SSL support

The main idea is to have a more efficient relay system for NetBird. Turn/Coturn is a really good system for short-term connections. As a connection via VPN usually lasts many hours or days, we need a more efficient system that can easily be scaled.

@mlsmaycon
Copy link
Collaborator Author

Is it okay to update to 0.29.0 without actually running the new relay image and changing management.json?

Yes it is. You don't need to update or configure anything if you don't want. It should be fully compatible with older versions of the management.json file.

@mlsmaycon
Copy link
Collaborator Author

For new relay to work is there any new openwrt package released?

We will look into updating the openwrt version.

@mlsmaycon
Copy link
Collaborator Author

Is it okay to use same domain for management, signal, coturn and relay?

Example: If i use domain netbird.domain.com and i want to use this domain for all services but with different port is that okay?

Yes it is possible.

@MDMeridio001
Copy link

Hello,
Is it possible to run the relay behind nginx acting as a proxy? I have tried by adding the following to my nginx configuration file, but it results in clients recieving a 400 error when trying to establish a connection to the relay. A direct connection without nginx in front works perfectly fine.

upstream relay-upstream {
    server 127.0.0.1:33080;
}

[...]

# Proxy Relay http endpoint
    location /relay/ {
        proxy_pass http://relay-upstream/relay;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $host;
    }

try add these:

      proxy_set_header Host            $http_host;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header X-Forwarded-For $remote_addr;
      proxy_set_header X-Forwarded-Host $http_host;
      proxy_cache_bypass $http_upgrade;

and delete the directive:

proxy_set_header Host $host;

I added them but I am still getting the same error. I don't know if it is of any help but this is what I added to the docker-compose.yml file:

# Relay
  relay:
    image: netbirdio/relay:latest
    restart: unless-stopped
    environment:
    - NB_LOG_LEVEL=info
    - NB_LISTEN_ADDRESS=:33080
    - NB_EXPOSED_ADDRESS=netbird.mydomain.com:443
    - NB_AUTH_SECRET=<MYSECRET>
    ports:
      - 127.0.0.1:33080:33080
    logging:
      driver: "json-file"
      options:
        max-size: "500m"
        max-file: "2"

And this is what I added to management.json:

"Relay": {
        "Addresses": ["rel://netbird.mydomain.com:443/relay"],
        "CredentialsTTL": "24h",
        "Secret": "<MYSECRET>"
},

@mlsmaycon
Copy link
Collaborator Author

@MDMeridio001 it seems like you are using nginx for SSL termination too, in that case, try this:

    - NB_EXPOSED_ADDRESS=rels://netbird.mydomain.com:443

and

"Relay": {
        "Addresses": ["rels://netbird.mydomain.com:443"],
        "CredentialsTTL": "24h",
        "Secret": "<MYSECRET>"
},

@MDMeridio001
Copy link

@MDMeridio001 it seems like you are using nginx for SSL termination too, in that case, try this:

    - NB_EXPOSED_ADDRESS=rels://netbird.mydomain.com:443

and

"Relay": {
        "Addresses": ["rels://netbird.mydomain.com:443"],
        "CredentialsTTL": "24h",
        "Secret": "<MYSECRET>"
},

I completely forgot I needed to add "rels://", thank you so much, it's working fine now.

@rgdev
Copy link

rgdev commented Sep 10, 2024

Assuming a brand new deployment and all clients running 0.29+ where does coturn fit in the picture ? Can we just run coturn with --stun-only if retrocompability is no concern ?

@mlsmaycon
Copy link
Collaborator Author

@rgdev With a new deployment, it is very likely that Coturn will only be used with mobile clients until we update them.

@Roeda
Copy link

Roeda commented Sep 10, 2024

@rgdev With a new deployment, it is very likely that Coturn will only be used with mobile clients until we update them.

Excuse my confusion, but since you say that you still use STUN for peer discovery, and at the same time Coturn won’t be used when the mobile apps are updated. Does that mean that the STUN service is baked into the new Relay now (or the management service) ? (Would we be ultimately able to remove Coturn from docker compose and the management.json ?)
Thank you very much for this new implementation it sounds cool and production friendly

@wehagy
Copy link

wehagy commented Sep 10, 2024

For new relay to work is there any new openwrt package released?

We will look into updating the openwrt version.

I am updating netbird package against openwrt snapshot for months, and I have no problem so far, in fact I have built the new version 0.29.0 and is working fine, and open a PR openwrt/packages#24950, for now I just see the error 2024-09-10T15:25:09-03:00 INFO [peer: [ REDACTED ]=] client/internal/peer/worker_relay.go:59: Relay is not supported by remote peer, probably because I'm not selfhosting, and from release notes:

  • Cloud support for the new relay feature is coming soon*.

But I'm not backporting to openwrt 23.05, one of my targets is supported only on openwrt snapshot.

And to be honest someone open a issue openwrt/packages#24569 (comment) on openwrt repo to backport a new version, I offered my help to the person if he could test it, but I got no response.

@ismail0234
Copy link

ismail0234 commented Sep 10, 2024

@mlsmaycon Thanks for the explanation. Do you think about optimization on the api side? The api slows down after 200 peers connected to the system. After 500 peers, it slows down a lot. Each request takes more than 1-2 seconds.

In the test measurements I made, these are the response times returned from the api according to the number of peers connected to the system.

20 Peers: 200-300 ms
100 Peers 300-600 ms
200 Peers: 500-1000 ms
500 Peers: 1500-3000 ms

@mlsmaycon
Copy link
Collaborator Author

Hey folks, we have a new release, 0.29.1. This release improves the relay with better authentication messages. To ensure your system is working properly, you should upgrade your relay and management servers before upgrading your clients.

@allroundtechie
Copy link

Works like a charm, thanks!

@marcportabellaclotet-mt

Thanks for improving the relay functionality.
I can't find the relay repo in netbirdio github. Will it be private or closed source?

@allroundtechie
Copy link

@marcportabellaclotet-mt

https://github.com/netbirdio/netbird/tree/main/relay

@ptpu
Copy link

ptpu commented Sep 11, 2024

A short example for traefik which is working fine for me:

docker-compose.yml

relay:
    image: "netbirdio/relay:latest"
    container_name: netbird-relay
    restart: unless-stopped
    env_file:
      - relay.env
      - common.env
    labels:
      traefik.enable: 'true'
      traefik.http.routers.netbird-relay.rule: 'Host("netbird.mydomain.com") && PathPrefix("/relay")'
      traefik.http.routers.netbird-relay.entrypoints: websecure
      traefik.http.routers.netbird-relay.service: netbird-relay-service
      traefik.http.services.netbird-relay-service.loadbalancer.server.port: 33080

relay.env

NB_LOG_LEVEL=info
NB_LISTEN_ADDRESS=:33080
NB_EXPOSED_ADDRESS=rels://netbird.mydomain.com:443/relay
NB_AUTH_SECRET=secret

management.json

"Relay": {
        "Addresses": ["rels://netbird.mydomain.com:443/relay"],
        "CredentialsTTL": "24h",
        "Secret": "secret"
    },

@pugnobellum
Copy link

pugnobellum commented Sep 11, 2024

Relay compose file

  relay:
    image: netbirdio/relay:latest
    container_name: netbird_relay
    restart: unless-stopped
    environment:
    - NB_LOG_LEVEL=info
    - NB_LISTEN_ADDRESS=:33080
    - NB_EXPOSED_ADDRESS=rels://netbird.mydomain.com:443
    - NB_AUTH_SECRET=secret
    ports:
      - 33080:33080
    networks:
      - proxynet
    logging:
      driver: "json-file"
      options:
        max-size: "500m"
        max-file: "2"

management.json

   "Relay": {
    "Addresses": ["rels://netbird.mydomain.com:443"],
    "CredentialsTTL": "24h",
    "Secret": "secret"
    },

netbird.subdomain.conf

server {
    listen 443 ssl;
    listen [::]:443 ssl;

    server_name netbird.mydomain.com;

    include /config/nginx/ssl3.conf;

    client_max_body_size 128M;
    client_header_timeout 1d;
    client_body_timeout 1d;

    location / {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;
        set $upstream_app netbird_dashboard;
        set $upstream_port 80;
        set $upstream_proto http;
        proxy_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /api {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;
        set $upstream_app netbird_management;
        set $upstream_port 443;
        set $upstream_proto http;
        proxy_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /signalexchange.SignalExchange/ {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;

        grpc_read_timeout 1d;
        grpc_send_timeout 1d;
        grpc_socket_keepalive on;

        set $upstream_app netbird_signal;
        set $upstream_port 80;
        set $upstream_proto grpc;
        grpc_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /management.ManagementService/ {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;

        grpc_read_timeout 1d;
        grpc_send_timeout 1d;
        grpc_socket_keepalive on;

        set $upstream_app netbird_management;
        set $upstream_port 443;
        set $upstream_proto grpc;
        grpc_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /relay/ {
        proxy_pass http://netbird_relay:33080/relay;

        # WebSocket support
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        
        # Forward headers
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeout settings
        proxy_read_timeout 3600s;
        proxy_send_timeout 3600s;
        proxy_connect_timeout 60s;

        # Handle upstream errors
        proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
    }


}


I use SWAG reverse proxy which just bundles nginx and lets encrypt, my config files are above. I'm trying to add the new relay service. When I fire up my docker client/agent I get this error in the logs for it:

UPDATE: the current relay location I have now works.

@mlsmaycon
Copy link
Collaborator Author

@ismail0234 Github, but feel free to reach out on Slack for a faster iteration.

@ndziuba
Copy link

ndziuba commented Sep 12, 2024

For People that are using Caddy (based on the zitadel starter script)

Caddyfile:

  :80, netbird.example.com:443 {
          import security_headers
          reverse_proxy /relay* relay:80
          reverse_proxy /signalexchange.SignalExchange/* h2c://signal:10000
          reverse_proxy /api/* management:80
          reverse_proxy /management.ManagementService/* h2c://management:80
          reverse_proxy /* dashboard:80
  }

relay.env

  NB_LOG_LEVEL=info
  NB_LISTEN_ADDRESS=:80
  NB_EXPOSED_ADDRESS=rels://netbird.example.com:443
  NB_AUTH_SECRET="secret"

managment.json

  "Relay": {
          "Addresses": ["rels://netbird.example.com:443/relay"],
          "CredentialsTTL": "24h",
          "Secret": "secret"
  },

docker-compose.yml

  #Relay
  relay:
    image: netbirdio/relay:latest
    container_name: relay
    restart: unless-stopped
    env_file:
      - ./relay.env
    networks:
      - netbird #If you use a network
    logging:
      driver: "json-file"
      options:
        max-size: "500m"
        max-file: "2"

@pellz0r
Copy link

pellz0r commented Sep 12, 2024

I followed the above settings for Caddy (as I've used the Zitadel starter script once upon a time), but when a node with the latest client tries to connect I get the following:

2024-09-12T19:35:10Z DEBG client/internal/connect.go:176: connecting to the Management service netbird.mydomain.se:443
2024-09-12T19:35:10Z DEBG util/net/dialer_nonios.go:52: Dialing tcp netbird.mydomain.se:443
2024-09-12T19:35:10Z DEBG client/internal/connect.go:184: connected to the Management service netbird.mydomain.se:443
2024-09-12T19:35:11Z DEBG util/net/dialer_nonios.go:52: Dialing tcp netbird.mydomain.se:443
2024-09-12T19:35:11Z DEBG signal/client/grpc.go:81: connected to Signal Service: netbird.mydomain.se:443
2024-09-12T19:35:11Z INFO client/internal/connect.go:251: connecting to the Relay service(s): rels://netbird.mydomain.se:443/relay
2024-09-12T19:35:11Z DEBG relay/client/manager.go:93: starting relay client manager with [rels://netbird.mydomain.se:443/relay] relay servers
2024-09-12T19:35:11Z INFO [client_id: sha-WnVmoeH7RuspQpTopd/8RnZhD9vXJTf3J9VTglSeyGk=] relay/client/client.go:141: connecting to relay server: rels://netbird.mydomain.se:443/relay
2024-09-12T19:35:11Z DEBG util/net/dialer_nonios.go:52: Dialing tcp netbird.mydomain.se:443
2024-09-12T19:35:11Z ERRO relay/client/dialer/ws/ws.go:36: failed to dial to Relay server 'wss://netbird.mydomain.se:443/relay': failed to WebSocket dial: expected handshake response status code 101 but got 404
2024-09-12T19:35:11Z WARN relay/client/manager.go:130: Connection attempt failed: failed to connect to rels://netbird.mydomain.se:443/relay: failed to WebSocket dial: expected handshake response status code 101 but got 404
2024-09-12T19:35:11Z ERRO client/internal/connect.go:253: failed to connect to any relay server: all attempts failed

Not really sure what I'm missing

EDIT: Ah, I messed up and didn't pull / restart all the containers. :)

@1nerdyguy
Copy link

To confirm:

With the new relay, I still need to have the Coturn instance for STUN at this time?

But I can deploy out multiple relay instances, update the config file accordingly, and it will use those?

@tienlq2011
Copy link

Could there be a guide to deploying them on Kubernetes? Thank you!

@1nerdyguy
Copy link

Could there be a guide to deploying them on Kubernetes? Thank you!

To my understanding, it looks like it's just it's own container. So you'd just spin them up, map the port, and then update the management.json file and rebuild the containers

@rgdev
Copy link

rgdev commented Sep 16, 2024

Relay on k8s (behind a ingress-nginx reverse since it's websockets) :

Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
  name: netbird-relay-ingress
  namespace: netbird
spec:
  ingressClassName: nginx
  rules:
  - host: netbird.company.com
    http:
      paths:
      - backend:
          service:
            name: netbird-relay
            port:
              number: 80
        path: /relay
        pathType: Prefix
  tls:
  - hosts:
    - netbird.company.com
    secretName: netbird-tls

Service

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: netbird-relay
  name: netbird-relay
  namespace: netbird
spec:
  ports:
  - name: relay
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app.kubernetes.io/name: netbird-relay

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: netbird-relay
  name: netbird-relay
  namespace: netbird
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: netbird-relay
  template:
    metadata:
      labels:
        app.kubernetes.io/component: relay
        app.kubernetes.io/instance: netbird-relay
        app.kubernetes.io/name: netbird-relay
        app.kubernetes.io/part-of: netbird
    spec:
      containers:
      - env:
        - name: NB_LOG_LEVEL
          value: info
        - name: NB_LISTEN_ADDRESS
          value: :80
        - name: NB_AUTH_SECRET
          valueFrom:
            secretKeyRef:
              key: auth_secret
              name: netbird-relay-authkey
        - name: NB_EXPOSED_ADDRESS
          value: rels://netbird.company.com:443
        image: netbirdio/relay:0.29.2
        imagePullPolicy: IfNotPresent
        name: netbird-relay
        ports:
        - containerPort: 80
          name: relay
          protocol: TCP

The deployment references a netbird-relay-authkey Secret you need to provide it with a key of your choice.

@marcportabellaclotet-mt

Relay performance question...
I was testing netbird speed using direct connection (opening wg ports) and using relay, and it seems that there is a big performance penalty. Anyone have similars results?
Direct connection : 150Mbit speed
Using Relay: 30 Mbit speed.
I haven't a turn setup, so I can not compare.

@mlsmaycon
Copy link
Collaborator Author

Hey @marcportabellaclotet-mt can you check with different MTU configurations for the NetBird interface on both ends of the connection?

Also, can you share which tool you used for the test?

@marcportabellaclotet-mt

I am using iperf and speedtest.
MTU is 1500 in both sides.
Relay app is deployed as a lxc container.

@1nerdyguy
Copy link

@marcportabellaclotet-mt
Does the relay have adequate upload/download bandwidth? Since all traffic on a relayed connection flows 'through' it, you're limited by the download/upload of the relay. It may also impact the latency between clients, depending how far the relay is from their point of presence.

@marcportabellaclotet-mt

I am testing the relay service in the same network where netbird client is hostes, so there is no BW restriction.
My test setup is:

  • netbird client deployed at main office network.
  • relay app deployed at main office network
  • netbird client installed in my laptop at home.

@mlsmaycon
Copy link
Collaborator Author

@marcportabellaclotet-mt, the wireguard interface created by NetBird has an MTU of 1280, which can influence the performance and concurrent transfer of other peers' connections in the relay.

A good starting test could be to update your peer's MTU size to 1420 for the wt0 or utun100(macOS) interfaces and test again.

@marcportabellaclotet-mt

Thanks for answering @mlsmaycon . I will try some debug during the weekend.

@SamB-GB
Copy link

SamB-GB commented Sep 17, 2024

Relay compose file

  relay:
    image: netbirdio/relay:latest
    container_name: netbird_relay
    restart: unless-stopped
    environment:
    - NB_LOG_LEVEL=info
    - NB_LISTEN_ADDRESS=:33080
    - NB_EXPOSED_ADDRESS=rels://netbird.mydomain.com:443
    - NB_AUTH_SECRET=secret
    ports:
      - 33080:33080
    networks:
      - proxynet
    logging:
      driver: "json-file"
      options:
        max-size: "500m"
        max-file: "2"

management.json

   "Relay": {
    "Addresses": ["rels://netbird.mydomain.com:443"],
    "CredentialsTTL": "24h",
    "Secret": "secret"
    },

netbird.subdomain.conf

server {
    listen 443 ssl;
    listen [::]:443 ssl;

    server_name netbird.mydomain.com;

    include /config/nginx/ssl3.conf;

    client_max_body_size 128M;
    client_header_timeout 1d;
    client_body_timeout 1d;

    location / {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;
        set $upstream_app netbird_dashboard;
        set $upstream_port 80;
        set $upstream_proto http;
        proxy_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /api {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;
        set $upstream_app netbird_management;
        set $upstream_port 443;
        set $upstream_proto http;
        proxy_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /signalexchange.SignalExchange/ {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;

        grpc_read_timeout 1d;
        grpc_send_timeout 1d;
        grpc_socket_keepalive on;

        set $upstream_app netbird_signal;
        set $upstream_port 80;
        set $upstream_proto grpc;
        grpc_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /management.ManagementService/ {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;

        grpc_read_timeout 1d;
        grpc_send_timeout 1d;
        grpc_socket_keepalive on;

        set $upstream_app netbird_management;
        set $upstream_port 443;
        set $upstream_proto grpc;
        grpc_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /relay/ {
        proxy_pass http://netbird_relay:33080/relay;

        # WebSocket support
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        
        # Forward headers
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeout settings
        proxy_read_timeout 3600s;
        proxy_send_timeout 3600s;
        proxy_connect_timeout 60s;

        # Handle upstream errors
        proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
    }


}

I use SWAG reverse proxy which just bundles nginx and lets encrypt, my config files are above. I'm trying to add the new relay service. When I fire up my docker client/agent I get this error in the logs for it:

UPDATE: the current relay location I have now works.

Thanks @pugnobellum adding the /relay onto the proxy pass location fixed my issue.

@vampywiz17
Copy link

@mlsmaycon

It possible to use cloud hosted Netbird (free tier), but self hosted relay?

@drnkknt
Copy link

drnkknt commented Sep 20, 2024

@mlsmaycon

hi maycom, i want to confirm if running a relay service alongside Coturn will cause connection issues on the user client or maybe connection between service? Currently, many of my users are still using versions below 29

@thorstenkramm
Copy link

Just if someone uses Caddy as reverse proxy, here is my config, that works fine with the relay container.

# Reverse proxy for Netbird UI, Relay, Signal and API
netbird.example.com: {
    # World accessible GRPC endpoint for all access nodes and clients
    handle /management.ManagementService/* {
        reverse_proxy h2c://127.0.0.1:33073
    }
    # World accessible relay endpoint
    handle /relay* {
        reverse_proxy http://127.0.0.1:33080 # Relay
    }

    # Protected management routes
    handle {
        @denied not client_ip <YOUR_ALLOWED_IP_FOR_MGMT>
        abort @denied

        reverse_proxy /api/* http://127.0.0.1:33073 # Management
        reverse_proxy /* http://127.0.0.1:8080 # Dashboard Singepage App
        header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
        header Referrer-Policy no-referrer
        header Permissions-Policy "geolocation=(), microphone=()"
    }
    log {
	    output file /var/log/caddy/door1.az.dimedis.net.log
    }
}

In the mangement.json I appended:

"Relay": {
        "Addresses": ["rels://netbird.example.com:443"],
        "CredentialsTTL": "24h",
        "Secret": "xyz"
    }

The part

@denied not client_ip <YOUR_ALLOWED_IP_FOR_MGMT>
abort @denied

is, of course, optional. I wanted to make the Management API and UI accessible for a known IP address only.

@drnkknt
Copy link

drnkknt commented Sep 25, 2024

thankyou @thorstenkramm

Just if someone uses Caddy as reverse proxy, here is my config, that works fine with the relay container.

# Reverse proxy for Netbird UI, Relay, Signal and API

i can confirm this config is working if you using caddy, for me i prefer using relay instead of http://127.0.0.1 and change rel:// to rels:// in management.json

Caddyfile

:80, netbird.domain.com:443 {
    import security_headers
    # Relay
    handle /relay* {
        reverse_proxy relay:33080 # Relay
    }

    # Signal
    reverse_proxy /signalexchange.SignalExchange/* h2c://signal:10000
    # Management
    reverse_proxy /api/* management:80
    reverse_proxy /management.ManagementService/* h2c://management:80
    # Zitadel
    reverse_proxy /zitadel.admin.v1.AdminService/* h2c://zitadel:8080
    reverse_proxy /admin/v1/* h2c://zitadel:8080

docker-compose.yaml

 # Relay
  relay:
    image: netbirdio/relay:latest
    restart: unless-stopped
    networks: [netbird]
    env_file:
      - ./relay.env
    ports:
      - 33080:33080
    logging:
      driver: "json-file"
      options:
        max-size: "500m"
        max-file: "2"

management.json

 "Relay": {
        "Addresses": ["rels://netbird.domain.com:443"],
        "CredentialsTTL": "24h",
        "Secret": "secret"
    }

relay.env

NB_LOG_LEVEL=info
NB_LISTEN_ADDRESS=:33080
NB_EXPOSED_ADDRESS=rels://netbird.domain.com:443
NB_AUTH_SECRET=secret

@daifeilail
Copy link

I recommend continuing to use COTURN as the relay. COTURN can be further developed as needed to meet NETBIRD's requirements.

The reasons are as follows:

QoS Control Issues:

Merging all requests into a single TCP connection can make Quality of Service (QoS) difficult to manage effectively.
Complexity of Relay Networks:

The complexity of relay networks is influenced by factors such as firewalls and QoS. Proprietary protocols may encounter unusual issues, such as being blocked, in complex network environments.
Mature Telecom-Grade Solutions:

While many relay solutions exist, currently only TURN can be effectively deployed in telecom-grade solutions. Similar to VXLAN and EVPN, which require collaboration among various vendors to implement based on a standard, TURN stands out as the viable option for reliable relay networks.
Development of Low-Level Communication Protocols:

Developing low-level communication protocols requires many years of accumulation. I believe that adding features at the application layer is much more cost-effective than investing in technologies that may fail during the R&D phase.

@Spiritreader
Copy link

Spiritreader commented Sep 29, 2024

What is the benefit of enabling TLS compared to leaving it off?
If I understand this correctly the purpose of the relay is to relay wireguard traffic, which is encrypted already and the relay server endpoints are statically configured via management.json

So other than somebody hijacking your DNS to point to a malicious relay while at the same time having stolen the relay secret, why should I enable TLS?

@PeterWang-dev
Copy link

PeterWang-dev commented Oct 2, 2024

@mlsmaycon

It possible to use cloud hosted Netbird (free tier), but self hosted relay?

Same question here. A self-host-able DERP like relay server is very critical for low latency access.

@Roeda
Copy link

Roeda commented Oct 6, 2024

What is the benefit of enabling TLS compared to leaving it off? If I understand this correctly the purpose of the relay is to relay wireguard traffic, which is encrypted already and the relay server endpoints are statically configured via management.json

So other than somebody hijacking your DNS to point to a malicious relay while at the same time having stolen the relay secret, why should I enable TLS?

I wonder essentially the same, what a the recommendation to best secure the Relay service. Normally we put the management cluster behind a Reverse Proxy/API gateway + waf & API protection (with support for grpc). but this configuration will increase latency and create problems for Relay trafic.

so what needs to be protected in the relay service, and what is the official recommendation for security layers to add in production ?
@mlsmaycon
any input guys is welcome. thank you very much in advance

@thorstenkramm
Copy link

Problem: Relayed connections not working

Problem

I have two peers, both on Debian 12 Linux with Netbird version 0.30.1
Problem started to appear with version 0.29, but I wouldn't say it started directly after the update to 0.30.

Both peers show

odroid.example.vpn:
  NetBird IP: 100.125.107.139
  Public key: xxx
  Status: Connected
  -- detail --
  Connection type: Relayed
  ICE candidate (Local/Remote): -/-
  ICE candidate endpoints (Local/Remote): -/-
  Relay server address: rels://door1.az.example.net:443
  Last connection update: 7 seconds ago
  Last WireGuard handshake: -
  Transfer status (received/sent) 296 B/332 B
  Quantum resistance: false
  Routes: -
  Latency: 0s

But there is no working connection between the peers. No ping. No nothing.
The connection worked flawlessly for weeks with connection type P2P. Suddenly, the P2P stopped functioning.
I'm not aware of changes to the firewall or the routing.

What makes me wonder is the connection status to the management host.

OS: linux/amd64
Daemon version: 0.30.1
CLI version: 0.30.1
Management: Connected to https://door1.az.example.net:443
Signal: Connected to http://door1.az.example.net:10000
Relays: 
  [stun:door1.az.example.net:3478] is Available
  [turn:door1.az.example.net:3478?transport=udp] is Unavailable, reason: allocate: attribute not found
  [rels://door1.az.example.net:443] is Available

I have a couple of other peers. They are all connected via P2P and all works flawlessly.

The management host is an MS Azure VM.

$mgmt: docker exec netbird-management-1 /go/bin/netbird-mgmt --version
netbird-mgmt version 0.30.1

Relay appears to be the latest version, too. (No version option available)

docker logs netbird-relay-1 -f
2024-10-14T06:40:48Z INFO relay/cmd/root.go:149: server will be available on: rels://door1.az.example.net:443
2024-10-14T06:40:48Z INFO relay/cmd/root.go:124: running metrics server: :9090/metrics
2024-10-14T06:40:48Z INFO relay/server/listener/ws/listener.go:39: WS server listening address: :33080
2024-10-14T06:40:50Z INFO [peer_id: sha-xx+DGnBKca4vsfSnTA=] relay/server/relay.go:120: peer connected from: 172.21.0.1:38154
2024-10-14T06:40:50Z INFO [peer_id: sha-xx=] relay/server/relay.go:120: peer connected from: 172.21.0.1:38164
2024-10-14T06:40:50Z INFO [peer_id: sha-xx/eIdZqPbFv/xx=] relay/server/relay.go:120: peer connected from: 172.21.0.1:38180
2024-10-14T06:40:50Z INFO [peer_id: sha-xx/xx=] relay/server/relay.go:120: peer connected from: 172.21.0.1:38194
2024-10-14T06:53:39Z INFO [peer_id: sha-xx/xx+xx=] relay/server/relay.go:120: peer connected from: 172.21.0.1:37250

Caddy reverse proxy

It all runs behind a caddy reverse proxy.

door1.az.example.net: {
    # World accessible GRPC endpoint for all access nodes and clients
    handle /management.ManagementService/* {
        reverse_proxy h2c://127.0.0.1:33073
    }
    # World accessible relay endpoint
    handle /relay* {
        reverse_proxy http://127.0.0.1:33080 # Relay
    }

    # Protected management routes
    handle {
        @denied not client_ip x.x.x.x
        abort @denied

        reverse_proxy /api/* http://127.0.0.1:33073 # Management
        reverse_proxy /* http://127.0.0.1:8080 # Dashboard Singepage App
        header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
        header Referrer-Policy no-referrer
        header Permissions-Policy "geolocation=(), microphone=()"
    }
    log {
	    output file /var/log/caddy/door1.az.example.net.log
    }
}

Questions

How to investigate further

  • why P2P stopped working?
  • why are relayed connection failing?

Is the status of the management host [turn:door1.az.example.net:3478?transport=udp] is Unavailable, reason: allocate: attribute not found something I should worry about?
Firewall is open. Turn server is listening.

Any help is much appreciated.

@marcportabellaclotet-mt

Back to the stun topic, is it planned to remove the stun requirement in the future, to make the deployment simpler? Will relay service be able to manage the p2p discovery by itself?

@rudradevpal
Copy link

@wehagy can you please let me know where can i find Latest openwrt packages

@wehagy
Copy link

wehagy commented Oct 21, 2024

@wehagy can you please let me know where can i find Latest openwrt packages

@rudradevpal you can find the most up-to-date netbird packages for openwrt using the snapshot version of openwrt. Probably, but I haven't tested it and I might be wrong, you can download the latest .ipk package of netbird from the openwrt snapshot repository and install it on the stable version of openwrt.

@benniekiss
Copy link
Contributor

I'm experiencing issues with proxying the relay service with caddy, but I am able to do so with nginx, and I was wondering if anyone had configuration advice.

The relay service is using self signed certificates, so the Caddyfile looks like this:

...
        handle /relay* {
                reverse_proxy {RELAY_ADDRESS}:{RELAY_PORT} {
                        transport http {
                                tls
                                tls_insecure_skip_verify
                                read_timeout 86400s
                                write_timeout 86400s
                                keepalive_interval 75s
                        }
                }
        }

And I'm getting this consistently in my logs -- both caddy and the relay service

{"level":"error","ts":1731175600.2423942,"logger":"http.handlers.reverse_proxy","msg":"reading from backend","error":"stream error: stream ID 5; CANCEL"}

{"level":"error","ts":1731175600.2424295,"logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response","upstream":"localhost:10000","duration":0.011144689,"request":{"remote_ip":"10.10.10.1","remote_port":"26566","client_ip":"10.10.10.1","proto":"HTTP/2.0","method":"POST","host":"{DOMAIN}:443","uri":"/signalexchange.SignalExchange/ConnectStream","headers":{"User-Agent":["grpc-go/1.64.1"],"Te":["trailers"],"X-Forwarded-For":["10.10.10.1"],"X-Forwarded-Proto":["https"],"X-Forwarded-Host":["{DOMAIN}:443"],"X-Wiretrustee-Peer-Id":["{ID}"],"Accept-Encoding":["gzip"],"Content-Type":["application/grpc"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":""}},"error":"reading: stream error: stream ID 5; CANCEL"}

2024-11-09T13:06:53-05:00 ERRO [peer_id: ] relay/server/peer.go:61: failed to read message: failed to get reader: failed to read frame header: EOF

2024-11-09T13:06:53-05:00 DEBG [peer_id: ] relay/server/relay.go:137: relay connection closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests