-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Envoy core changes for reverse connections #37368
base: main
Are you sure you want to change the base?
Conversation
Hi @basundhara-c, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
00a138f
to
e8d32a5
Compare
Commit Message: This commit collates the envoy core changes for reverse connections. Additional Description: Risk Level: Testing: Docs Changes: Release Notes: Platform Specific Features: [Optional Runtime guard:] [Optional Fixes #Issue] [Optional Fixes commit #PR or SHA] [Optional Deprecated:] [Optional [API Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):] Signed-off-by: Basundhara Chakrabarty <[email protected]> Co-authored-by: Arun Vasudevan <[email protected]> Co-authored-by: Tejas Sangol <[email protected]> Co-authored-by: Aditya Jaltade <[email protected]>
e8d32a5
to
457e1e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really glad you got this working! Tossed in a few high level comments but I'm going to assign a first pass reviewer for the rest
envoy/event/dispatcher.h
Outdated
/** | ||
* @return the cluster manager pointer. | ||
*/ | ||
virtual Upstream::ClusterManager* getClusterManager() PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think cluster manager APIs belong in the dispatcher - I think it's worth finding a more clean way to get the cluster manager you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@basundhara-c, could you pass the singleton ClusterManager()
from server when creating the RCThreadLocalRegistry
in your extension, so that the RCmanager
has the thread_local_cluster()
api, and we don't need to use this worker_.dispatcher().set/getClusterManager()
?
envoy/event/dispatcher.h
Outdated
* Provides filters access to connection handler to save outgoing connections as | ||
* incoming connections for reverse tunnels | ||
*/ | ||
virtual void setConnectionHandler(Network::ConnectionHandler* connection_handler) PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see what this function has to do with the dispatcher - did you stash this here as a convenient way to get the handler local to your thread? I think you want to look into the thread local state (tls) getters and setters.
envoy/network/connection_handler.h
Outdated
@@ -19,6 +21,26 @@ | |||
namespace Envoy { | |||
namespace Network { | |||
|
|||
// The thread local registry. | |||
class LocalRevConnRegistry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you should be able to do this PR with minimal APIs in core Envoy, but instead in your extension directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alyssawilk The main set of changes is to support the creation of the ReverseConnectionManager and ReverseConnectionHandler thread-locally as implemented using a thread-local registry. The ReverseConnectionManager and ReverseConnectionHandler need to come up right after the workers are created because they are involved in accepting reverse connections from other initiating envoy instances even if the local envoy is not initiating any reverse connections, as long as the local envoy has the reverse connections bootstrap extension enabled.
So, currently, we check if the reverse connection bootstrap extension is enabled by checking if the singleton is present, and if so, right after workers are created, we post to each worker's dispatcher a functor to create the Thread Local Registries (creating the ReverseConnectionManager and ReverseConnectionHandler). This thread local registry is then parked with the Connection Handler and is accessed later wherever we need access to the ReverseConnectionManager and ReverseConnectionHandler like here,to initiate reverse conns, here in the reverse conn filter, which accepts reverse conns and in multiple other places where only the thread local dispatcher is otherwise available to us. Therefore, I have defined abstract classes for LocalRevConnRegistry, RevConnRegistry, etc so that these two entities can be stored in the form of a ThreadLocalRegistry and accessed later. We briefly touched upon this in this comment and on slack. In summary:
- We have created two thread local entities which get initialized right after the workers are created. They need to be parked somewhere so that they can be accessed later in code paths where only the thread local dispatcher is available. We have parked them with the thread local connection handler and therefore added the abstract classes under Envoy::Network.
- If the above is not the best approach, what would you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thread local registry is then parked with the Connection Handler and is accessed later wherever we need access to the ReverseConnectionManager and ReverseConnectionHandler like here,to initiate reverse conns, here in the reverse conn filter, which accepts reverse conns and in multiple other places where only the thread local dispatcher is otherwise available to us
For filters we can get the slot from serverFactoryContext from Server::Configuration::FactoryContext& context
during the filter chain creation, could you try to see if this can avoid the setConnectionHandler
api changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@botengyao I have implemented this in the last set of commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @basundhara-c, very interesting feature!
Here is a first pass to kick off the process.
envoy/event/dispatcher.h
Outdated
/** | ||
* @return the cluster manager pointer. | ||
*/ | ||
virtual Upstream::ClusterManager* getClusterManager() PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@basundhara-c, could you pass the singleton ClusterManager()
from server when creating the RCThreadLocalRegistry
in your extension, so that the RCmanager
has the thread_local_cluster()
api, and we don't need to use this worker_.dispatcher().set/getClusterManager()
?
envoy/network/connection_handler.h
Outdated
@@ -19,6 +21,26 @@ | |||
namespace Envoy { | |||
namespace Network { | |||
|
|||
// The thread local registry. | |||
class LocalRevConnRegistry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thread local registry is then parked with the Connection Handler and is accessed later wherever we need access to the ReverseConnectionManager and ReverseConnectionHandler like here,to initiate reverse conns, here in the reverse conn filter, which accepts reverse conns and in multiple other places where only the thread local dispatcher is otherwise available to us
For filters we can get the slot from serverFactoryContext from Server::Configuration::FactoryContext& context
during the filter chain creation, could you try to see if this can avoid the setConnectionHandler
api changes?
"Thread local rverse conn registry should not be null."); | ||
} | ||
|
||
void ConnectionHandlerImpl::saveUpstreamConnection(Network::ConnectionSocketPtr&& upstream_socket, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to store the upstream socket to a dedicated REVESR_CLUSTER
cluster so that it can be reused when public Envoy wants to establish connections to the on-perm one. And we also wan to save upstream connection here is just for PING and other handshake processes.
The request flow is like:
- on-prem Envoy -> public Envoy
- Then the public Envoy stores the downstream socket to the
REVESR_CLUSTER
, - and also initialize the listener to do
RPING
- And then the service behind the public Envoy can send request -> normal listener -> filster-chain / router -> the stored socket in
REVERSE_CLUSTER
.
Am I understanding correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@botengyao yes, most of the above is correct, the request flow is as follows:
Onprem -> Cloud envoy
- Onprem initiates reverse connections; it creates http connections to cloud envoy and sends a reverse connection initiation request through it.
- This request is intercepted by the reverse conn http filter and cloud envoy stores the sockets with the ReverseConnectionHandler. This ReverseConnectionHandler periodically sends RPINGs over all such cached sockets.
- On cloud envoy, a REVERSE_CONNECTION cluster type is defined and is used for all requests that need to be sent over a reverse connection. When a request arrives and this REVERSE_CONNECTION cluster is picked for the route, we interface with the ReverseConnectionHandler described above, get the cached socket and send the request over it.
@@ -263,6 +278,10 @@ void ConnectionImpl::setDetectedCloseType(DetectedCloseType close_type) { | |||
} | |||
|
|||
void ConnectionImpl::closeSocket(ConnectionEvent close_type) { | |||
if (connection_reused_ || !ConnectionImpl::ioHandle().isOpen()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When connection_reused_
is always true, how do we close the socket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@botengyao apologies for the late reply, we don't, in this case, since reverse connections are a very small number of long lived connections. However, we might be able to mark sockets dead in the Listener filter which will be passed these sockets (#37821), will test it out and update it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could need a red_button to clean up the resources even connection_reused_
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm, the changes in the connection socket with the customized connection reuse check seem hacky. It could be cleaner to create a dedicated close sequence only for the reused connection? Besides, the close can also be initialized by the peer, what will happen in this case? The socket will not close but the remote is closed, will we enter a dead loop when we find we cannot read/write data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@botengyao I am trying to find a cleaner way of running the close sequence.
@botengyao thanks a lot for the suggestion on obtaining the slot from the context! I am trying that out along with an attempt to move the code in extensions to contrib as much as possible and will be sharing the changes shortly! |
Signed-off-by: Basundhara Chakrabarty <[email protected]>
Signed-off-by: Basundhara Chakrabarty <[email protected]>
…er to the RCManager 2. Deleting unwanted APIs and ReverseConnectionManager and Handler header files Signed-off-by: Basundhara Chakrabarty <[email protected]>
Signed-off-by: Basundhara Chakrabarty <[email protected]>
Signed-off-by: Basundhara Chakrabarty <[email protected]>
@botengyao we have added a bunch of changes according to your suggestion, namely:
|
@botengyao can you take a look today? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Here is another pass, and some changes need at least unit tests to improve the coverage.
/wait
config.name())); | ||
} | ||
// Reverse connection listener should not bind to port. | ||
bind_to_port_ = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to add the invalid config check here, and we can reject it if the config contains bind_to_port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@botengyao The bind_to_port_ field is set to true by default in shouldBindToPort(config). The idea here is to set bind_to_port_ to false for any listener with a reverse connection config. Are you suggesting that we make it mandatory for bind_to_port to be explicitly set to false in the listener config for reverse connection listener?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I don't see there is reverse_connection_listener_config
in the v3::Listener
proto, is it missing? A dedicated sub-config for reverse connection is needed and if some conditions are needed we can update and validate the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reverse_connection_listener_config was indeed missing, my apologies! Added it in listener.proto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an indicator that the test is missing, could you add a test to cover the config, please?
…eck for reverse conn cluster Signed-off-by: Basundhara Chakrabarty <[email protected]>
Signed-off-by: Basundhara Chakrabarty <[email protected]>
…ptr and some minor nits Signed-off-by: Basundhara Chakrabarty <[email protected]>
Signed-off-by: Basundhara Chakrabarty <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better! Thanks for the contribution and here is another pass.
/wait
config.name())); | ||
} | ||
// Reverse connection listener should not bind to port. | ||
bind_to_port_ = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I don't see there is reverse_connection_listener_config
in the v3::Listener
proto, is it missing? A dedicated sub-config for reverse connection is needed and if some conditions are needed we can update and validate the config.
@@ -263,6 +278,10 @@ void ConnectionImpl::setDetectedCloseType(DetectedCloseType close_type) { | |||
} | |||
|
|||
void ConnectionImpl::closeSocket(ConnectionEvent close_type) { | |||
if (connection_reused_ || !ConnectionImpl::ioHandle().isOpen()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm, the changes in the connection socket with the customized connection reuse check seem hacky. It could be cleaner to create a dedicated close sequence only for the reused connection? Besides, the close can also be initialized by the peer, what will happen in this case? The socket will not close but the remote is closed, will we enter a dead loop when we find we cannot read/write data?
source/server/server.cc
Outdated
@@ -765,6 +766,7 @@ absl::Status InstanceBase::initializeOrThrow(Network::Address::InstanceConstShar | |||
ASSERT(config_.clusterManager()); | |||
xds_manager_ = | |||
std::make_unique<Config::XdsManagerImpl>(*config_.clusterManager(), validation_context_); | |||
listener_manager_->setClusterManagerForWorkers(config_.clusterManager()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is no setClusterManagerForWorkers()
anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out! Fixed some issues caused due to missed code including removal of this and addition of the reverse_connection_listener_config field in listener.proto
@botengyao
…n in server Signed-off-by: Basundhara Chakrabarty <[email protected]>
…verse connections Signed-off-by: Basundhara Chakrabarty <[email protected]>
CC @envoyproxy/runtime-guard-changes: FYI only for changes made to |
Signed-off-by: Basundhara Chakrabarty <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @phlax, do you have insights why the ci doesn't work for this PR?
It is showing Run # Check if the merge commit SHA is not null Merge commit information not found for pull request 37368.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the dedicated effort! Looks good in high level except the runtime guard and connection close sequence.
/wait
config.name())); | ||
} | ||
// Reverse connection listener should not bind to port. | ||
bind_to_port_ = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an indicator that the test is missing, could you add a test to cover the config, please?
@@ -1000,10 +1000,12 @@ void DownstreamFilterManager::sendLocalReply( | |||
// route refreshment in the response filter chain. | |||
cb->route(nullptr); | |||
} | |||
|
|||
|
|||
bool reverse_conn_force_local_reply = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We perfer latching the runtime value to avoid the cost to get it every time sending a local reply.
// We only prepare a local reply to execute later if we're actively | ||
// invoking filters to avoid re-entrant in filters. | ||
if (state_.filter_call_state_ & FilterCallState::IsDecodingMask) { | ||
if (!reverse_conn_force_local_reply && state_.filter_call_state_ & FilterCallState::IsDecodingMask) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually a runtime guard has a 6 month deletion policy, and it is usually used to guard the data plane behavior change. I think here you want a parameter to tell if this is going to be a local reply on a reverse connection. Could you point me where you detect this use case, and I assume it is in your HTTP filter extension.
|
/wait (Boteng has unresolved comments) |
Commit Message: This commit collates the envoy core changes for reverse connections, described in this github issue. A detailed description of reverse connections concepts and workflows is provided in both the github issue and in the examples section.
Additional Description: This PR involves several working components, that are added as part of the following extensions:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]
Signed-off-by: Basundhara Chakrabarty [email protected]
Co-authored-by: Arun Vasudevan [email protected]
Co-authored-by: Tejas Sangol [email protected]
Co-authored-by: Aditya Jaltade [email protected]