Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow WebSocket and native clients on the same socket #362

Closed
Neverlord opened this issue Jun 11, 2023 · 5 comments
Closed

Allow WebSocket and native clients on the same socket #362

Neverlord opened this issue Jun 11, 2023 · 5 comments

Comments

@Neverlord
Copy link
Member

This goes back to a discussion with @ckreibich. Having a single port for Broker would make it easier to use the WebSocket API for cluster management.

@Neverlord Neverlord self-assigned this Jun 11, 2023
@Neverlord Neverlord moved this to In Progress in Zeek 6.1 Jul 6, 2023
@Neverlord Neverlord removed this from Zeek 6.1 Oct 2, 2023
@ckreibich
Copy link
Member

There's debate about whether this feature is really needed. Original motivation was to avoid needing to open up multiple ports to support either transport.

@ckreibich
Copy link
Member

The above was a quick note during an update meeting. Here's where the idea came from.

There's no plan for the new Broker Python bindings (#380) to support anything but the WebSocket transport. Supporting the binary protocol would require linking bindings with the Broker library, or reimplementing the binary transport — neither of which is attractive. This implies that users of the bindings speak to other endpoints via WebSockets. In Zeek this requires an explicit Broker::listen_websocket().

One of the pain points folks running large clusters mentioned during Management framework research was the administrative overhead of opening up ports to their clusters. ZeekControl uses the current Broker bindings for Control framework event I/O with all nodes in the cluster. So at least during an interim phase where we have the new bindings but the management framework isn't yet ready, cluster operators using ZeekControl would have to open an additional port on every node, and we'd need to adjust the cluster code to listen on the additional port.

It seems easier to do away with the distinction at the API level and leave it to the implementation to sort it out. I agree that it's a bit unusal, but it's easy to do: binary goop vs an HTTP handshake at start of (post-TLS) payload. It implies a certain loss of control — let's say what if you had expected an endpoint to use the binary protocol and now something shows up speaking WebSockets — but that's unlikely to begin with and can be countered with appropriate logging. Pub/sub is already independent of individual connections.

Thoughts, given the above?

@awelzel
Copy link
Contributor

awelzel commented Feb 1, 2024

Would much prefer recommending and pushing for an architecture where there's just a single WebSocket port open for the whole cluster (served by the manager - or, if failover is actually a topic, listen on the proxies with an nginx loadbalancer in front). And define "internal cluster communication" running separately over the existing ports. For example, this would allow to run WebSocket connections over SSL and internal cluster communication non-encrypted (less overhead and tcpdumping broker traffic appears still one of the most effective ways to troubleshoot cluster communication in production). Firewall rules can also be more specific: External hosts may connect to the WebSocket port, but can not fuzz around with the binary protocol.

One of the pain points folks running large clusters mentioned during Management framework research was the administrative overhead of opening up ports to their clusters.

I'm not in that space, but yes, I'm sure it's annoying to have another port open. It should be a one-time thing though. We've changed the complete port cluster port range with zeek/zeekctl#41. This must have affected people, but I haven't heard uproar - though maybe the motivation for the change was good enough.

ZeekControl would have to open an additional port on every node, and we'd need to adjust the cluster code to listen on the additional port.

I have not fully thought this through or looked at what it does, but if we change ZeekControl to use WebSockets, maybe the eventing it does can be threaded through the manager instead of connecting to individual nodes (which has caused issues like #312 for large clusters)

do: binary goop vs an HTTP handshake at start of (post-TLS) payload.

If this is implemented, maybe at least we should use ALPN as a mechanism rather than magic signature matching on payload. Though that won't work with plaintext.

My opinion - will not further comment given other opinions exist :-)

@ckreibich
Copy link
Member

All good thoughts, thanks Arne. I just had another thought: I think it's actually okay to do nothing. When the new bindings come out, we'll obviously deprecate the current ones. But the deprecation window is up to us (the closest would be deprecating with 7.0 and pulling with 7.1, which arguably is quite short), and while everyone migrates to the new ones, we can simply keep using the old ones for zeekctl, which in itself is approaching deprecation. The downside is that we can't pull the old bindings from Broker until we're ready to move on from zeekctl, but I don't think that's the end of the world — it's not like we currently spend much if any effort maintaining them. Plus, it increases the incentive to get the management framework ready for production use.

@ckreibich
Copy link
Member

Btw — I agree with the centralization idea (one point to connect into the cluster). The management framework takes a few steps in that direction. When you launch a cluster via it, workers currently don't listen, because nothing needs to connect to them — at least so far.

Regarding the TLS/policy arguments, also duly noted. I just looked at this a bit and Zeek's Broker use will need some tweaking to allow different per-port security policies ... looks like a Broker endpoint gets a single configuration (with certs etc), and can then listen on a port with the traditional protocol and/or another one with WebSockets. It matches how we currently define the certs etc as globals in the script layer, not tied to the individual listens.

Anyway — I'm closing this as nothing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants