-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Access] Implement keepalive routine with ping-ponging to ws connection in ws controller #6757
base: master
Are you sure you want to change the base?
Changes from 14 commits
81ddee5
808b54b
6c5ab5d
eec15e5
fd567aa
098c10d
917bbde
438b130
86cdb35
ec4e247
eae6bbf
4e2d35c
9971188
6cd2841
c90d75f
afc8648
040a949
357dc2f
276ea7e
077c543
21259ce
1f5728d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -4,6 +4,8 @@ import ( | |||||||
"context" | ||||||||
"encoding/json" | ||||||||
"fmt" | ||||||||
"sync" | ||||||||
"time" | ||||||||
|
||||||||
"github.com/google/uuid" | ||||||||
"github.com/gorilla/websocket" | ||||||||
|
@@ -16,13 +18,31 @@ import ( | |||||||
"github.com/onflow/flow-go/utils/concurrentmap" | ||||||||
) | ||||||||
|
||||||||
const ( | ||||||||
// PingPeriod defines the interval at which ping messages are sent to the client. | ||||||||
// This value must be less than pongWait. | ||||||||
PingPeriod = (PongWait * 9) / 10 | ||||||||
|
||||||||
// PongWait specifies the maximum time to wait for a pong message from the peer. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this accurate?
Suggested change
|
||||||||
PongWait = 10 * time.Second | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't it be better to place it in |
||||||||
|
||||||||
// WriteWait specifies the maximum duration allowed to write a message to the peer. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there is a good explanation of what this means in the code. Can you elaborate some more here. Mostly, readers will look to the definitions to understand how to set/modify the values. |
||||||||
WriteWait = 10 * time.Second | ||||||||
) | ||||||||
|
||||||||
type Controller struct { | ||||||||
logger zerolog.Logger | ||||||||
config Config | ||||||||
conn *websocket.Conn | ||||||||
communicationChannel chan interface{} | ||||||||
logger zerolog.Logger | ||||||||
config Config | ||||||||
conn *websocket.Conn | ||||||||
|
||||||||
communicationChannel chan interface{} // Channel for sending messages to the client. | ||||||||
errorChannel chan error // Channel for reporting errors. | ||||||||
|
||||||||
dataProviders *concurrentmap.Map[uuid.UUID, dp.DataProvider] | ||||||||
dataProvidersFactory *dp.Factory | ||||||||
|
||||||||
shutdownOnce sync.Once // Ensures shutdown is only called once | ||||||||
shutdown bool // Indicates if the controller is shutting down. | ||||||||
} | ||||||||
|
||||||||
func NewWebSocketController( | ||||||||
|
@@ -37,67 +57,166 @@ func NewWebSocketController( | |||||||
config: config, | ||||||||
conn: conn, | ||||||||
communicationChannel: make(chan interface{}), //TODO: should it be buffered chan? | ||||||||
errorChannel: make(chan error, 1), // Buffered error channel to hold one error. | ||||||||
dataProviders: concurrentmap.New[uuid.UUID, dp.DataProvider](), | ||||||||
dataProvidersFactory: dp.NewDataProviderFactory(logger, streamApi, streamConfig), | ||||||||
} | ||||||||
} | ||||||||
|
||||||||
// HandleConnection manages the WebSocket connection, adding context and error handling. | ||||||||
// HandleConnection manages the lifecycle of a WebSocket connection, | ||||||||
// including setup, message processing, and graceful shutdown. | ||||||||
// | ||||||||
// Parameters: | ||||||||
// - ctx: The context for controlling cancellation and timeouts. | ||||||||
func (c *Controller) HandleConnection(ctx context.Context) { | ||||||||
//TODO: configure the connection with ping-pong and deadlines | ||||||||
defer close(c.errorChannel) | ||||||||
// configuring the connection with appropriate read/write deadlines and handlers. | ||||||||
err := c.configureConnection() | ||||||||
if err != nil { | ||||||||
// TODO: add error handling here | ||||||||
c.logger.Error().Err(err).Msg("error configuring connection") | ||||||||
c.shutdownConnection() | ||||||||
return | ||||||||
} | ||||||||
|
||||||||
//TODO: spin up a response limit tracker routine | ||||||||
go c.readMessagesFromClient(ctx) | ||||||||
c.writeMessagesToClient(ctx) | ||||||||
|
||||||||
// for track all goroutines and error handling | ||||||||
var wg sync.WaitGroup | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what do you think about using it would look something like g, ctx := errgroup.WithContext(ctx)
g.Go(func() error {
return c.readMessagesFromClient(ctx)
})
...
err := g.Wait()
if err != nil {
c.shutdownConnection()
}
`` There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd leave this decision to be made in #6642 as the error handling/routines start might be changed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why wait? Is this code being introduced in another PR? |
||||||||
|
||||||||
c.startProcess(&wg, ctx, c.readMessagesFromClient) | ||||||||
c.startProcess(&wg, ctx, c.keepalive) | ||||||||
c.startProcess(&wg, ctx, c.writeMessagesToClient) | ||||||||
|
||||||||
// Wait for context cancellation or errors from goroutines. | ||||||||
select { | ||||||||
case err := <-c.errorChannel: | ||||||||
c.logger.Error().Err(err).Msg("error detected in one of the goroutines") | ||||||||
//TODO: add error handling here | ||||||||
c.shutdownConnection() | ||||||||
case <-ctx.Done(): | ||||||||
// Context canceled, shut down gracefully | ||||||||
c.shutdownConnection() | ||||||||
} | ||||||||
|
||||||||
// Ensure all goroutines finish execution. | ||||||||
wg.Wait() | ||||||||
} | ||||||||
|
||||||||
// startProcess is a helper function to start a goroutine for a given process | ||||||||
// and ensure it is tracked via a sync.WaitGroup. | ||||||||
// | ||||||||
// Parameters: | ||||||||
// - wg: The wait group to track goroutines. | ||||||||
// - ctx: The context for cancellation. | ||||||||
// - process: The function to run in a new goroutine. | ||||||||
// | ||||||||
// No errors are expected during normal operation. | ||||||||
func (c *Controller) startProcess(wg *sync.WaitGroup, ctx context.Context, process func(context.Context) error) { | ||||||||
wg.Add(1) | ||||||||
|
||||||||
go func() { | ||||||||
defer wg.Done() | ||||||||
|
||||||||
err := process(ctx) | ||||||||
if err != nil { | ||||||||
// Check if shutdown has already been called, to avoid multiple shutdowns | ||||||||
if c.shutdown { | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a data race, isn't it? I'm thinking of the following situation:
If we need this, we have to use an atomic variable here. However, I don't understand why we need it, can you elaborate on it? |
||||||||
c.logger.Warn().Err(err).Msg("error detected after shutdown initiated, ignoring") | ||||||||
return | ||||||||
} | ||||||||
|
||||||||
c.errorChannel <- err | ||||||||
} | ||||||||
}() | ||||||||
} | ||||||||
|
||||||||
// configureConnection sets up the WebSocket connection with a read deadline | ||||||||
// and a handler for receiving pong messages from the client. | ||||||||
// | ||||||||
// The function does the following: | ||||||||
// 1. Sets an initial read deadline to ensure the server doesn't wait indefinitely | ||||||||
// for a pong message from the client. If no message is received within the | ||||||||
// specified `pongWait` duration, the connection will be closed. | ||||||||
// 2. Establishes a Pong handler that resets the read deadline every time a pong | ||||||||
// message is received from the client, allowing the server to continue waiting | ||||||||
// for further pong messages within the new deadline. | ||||||||
func (c *Controller) configureConnection() error { | ||||||||
// Set the initial read deadline for the first pong message | ||||||||
// The Pong handler itself only resets the read deadline after receiving a Pong. | ||||||||
// It doesn't set an initial deadline. The initial read deadline is crucial to prevent the server from waiting | ||||||||
// forever if the client doesn't send Pongs. | ||||||||
if err := c.conn.SetReadDeadline(time.Now().Add(PongWait)); err != nil { | ||||||||
return fmt.Errorf("failed to set the initial read deadline: %w", err) | ||||||||
} | ||||||||
// Establish a Pong handler which sets the handler for pong messages received from the peer. | ||||||||
c.conn.SetPongHandler(func(string) error { | ||||||||
return c.conn.SetReadDeadline(time.Now().Add(PongWait)) | ||||||||
}) | ||||||||
|
||||||||
return nil | ||||||||
} | ||||||||
|
||||||||
// writeMessagesToClient reads a messages from communication channel and passes them on to a client WebSocket connection. | ||||||||
// The communication channel is filled by data providers. Besides, the response limit tracker is involved in | ||||||||
// write message regulation | ||||||||
func (c *Controller) writeMessagesToClient(ctx context.Context) { | ||||||||
//TODO: can it run forever? maybe we should cancel the ctx in the reader routine | ||||||||
// | ||||||||
// No errors are expected during normal operation. | ||||||||
func (c *Controller) writeMessagesToClient(ctx context.Context) error { | ||||||||
for { | ||||||||
select { | ||||||||
case <-ctx.Done(): | ||||||||
return | ||||||||
return nil | ||||||||
UlyanaAndrukhiv marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
case msg := <-c.communicationChannel: | ||||||||
// TODO: handle 'response per second' limits | ||||||||
|
||||||||
// Specifies a timeout for the write operation. If the write | ||||||||
// isn't completed within this duration, it fails with a timeout error. | ||||||||
// SetWriteDeadline ensures the write operation does not block indefinitely | ||||||||
// if the client is slow or unresponsive. This prevents resource exhaustion | ||||||||
// and allows the server to gracefully handle timeouts for delayed writes. | ||||||||
if err := c.conn.SetWriteDeadline(time.Now().Add(WriteWait)); err != nil { | ||||||||
c.logger.Error().Err(err).Msg("failed to set the write deadline") | ||||||||
return err | ||||||||
} | ||||||||
err := c.conn.WriteJSON(msg) | ||||||||
if err != nil { | ||||||||
c.logger.Error().Err(err).Msg("error writing to connection") | ||||||||
return err | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
|
||||||||
// readMessagesFromClient continuously reads messages from a client WebSocket connection, | ||||||||
// processes each message, and handles actions based on the message type. | ||||||||
func (c *Controller) readMessagesFromClient(ctx context.Context) { | ||||||||
defer c.shutdownConnection() | ||||||||
|
||||||||
// | ||||||||
// No errors are expected during normal operation. | ||||||||
func (c *Controller) readMessagesFromClient(ctx context.Context) error { | ||||||||
for { | ||||||||
select { | ||||||||
case <-ctx.Done(): | ||||||||
c.logger.Info().Msg("context canceled, stopping read message loop") | ||||||||
return | ||||||||
return nil | ||||||||
default: | ||||||||
msg, err := c.readMessage() | ||||||||
if err != nil { | ||||||||
if websocket.IsCloseError(err, websocket.CloseNormalClosure, websocket.CloseAbnormalClosure) { | ||||||||
return | ||||||||
return nil | ||||||||
} | ||||||||
c.logger.Warn().Err(err).Msg("error reading message from client") | ||||||||
return | ||||||||
return err | ||||||||
} | ||||||||
|
||||||||
baseMsg, validatedMsg, err := c.parseAndValidateMessage(msg) | ||||||||
if err != nil { | ||||||||
c.logger.Debug().Err(err).Msg("error parsing and validating client message") | ||||||||
return | ||||||||
return err | ||||||||
} | ||||||||
|
||||||||
if err := c.handleAction(ctx, validatedMsg); err != nil { | ||||||||
c.logger.Warn().Err(err).Str("action", baseMsg.Action).Msg("error handling action") | ||||||||
return err | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
|
@@ -193,20 +312,61 @@ func (c *Controller) handleListSubscriptions(ctx context.Context, msg models.Lis | |||||||
} | ||||||||
|
||||||||
func (c *Controller) shutdownConnection() { | ||||||||
defer close(c.communicationChannel) | ||||||||
defer func(conn *websocket.Conn) { | ||||||||
if err := c.conn.Close(); err != nil { | ||||||||
c.logger.Error().Err(err).Msg("error closing connection") | ||||||||
c.shutdownOnce.Do(func() { | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||||
c.shutdown = true | ||||||||
|
||||||||
defer close(c.communicationChannel) | ||||||||
UlyanaAndrukhiv marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
defer func(conn *websocket.Conn) { | ||||||||
if err := c.conn.Close(); err != nil { | ||||||||
c.logger.Error().Err(err).Msg("error closing connection") | ||||||||
} | ||||||||
}(c.conn) | ||||||||
|
||||||||
err := c.dataProviders.ForEach(func(_ uuid.UUID, dp dp.DataProvider) error { | ||||||||
dp.Close() | ||||||||
return nil | ||||||||
}) | ||||||||
if err != nil { | ||||||||
c.logger.Error().Err(err).Msg("error closing data provider") | ||||||||
} | ||||||||
}(c.conn) | ||||||||
|
||||||||
err := c.dataProviders.ForEach(func(_ uuid.UUID, dp dp.DataProvider) error { | ||||||||
dp.Close() | ||||||||
return nil | ||||||||
c.dataProviders.Clear() | ||||||||
}) | ||||||||
if err != nil { | ||||||||
c.logger.Error().Err(err).Msg("error closing data provider") | ||||||||
} | ||||||||
|
||||||||
// keepalive sends a ping message periodically to keep the WebSocket connection alive | ||||||||
// and avoid timeouts. | ||||||||
// | ||||||||
// No errors are expected during normal operation. | ||||||||
func (c *Controller) keepalive(ctx context.Context) error { | ||||||||
pingTicker := time.NewTicker(PingPeriod) | ||||||||
defer pingTicker.Stop() | ||||||||
|
||||||||
for { | ||||||||
select { | ||||||||
case <-ctx.Done(): | ||||||||
return nil | ||||||||
case <-pingTicker.C: | ||||||||
if err := c.sendPing(); err != nil { | ||||||||
// Log error and exit the loop on failure | ||||||||
c.logger.Error().Err(err).Msg("failed to send ping") | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will get noisy in the logs at error level
Suggested change
|
||||||||
return err | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should stop keep-alive only if CloseErr was send to connection. However, I guess I will handle it in #6642 as it will be clear till that time |
||||||||
} | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
|
||||||||
// sendPing sends a periodic ping message to the WebSocket client to keep the connection alive. | ||||||||
// | ||||||||
// No errors are expected during normal operation. | ||||||||
func (c *Controller) sendPing() error { | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this abstraction do? Can't we get rid of it and use this code directly in keep-alive routine ? |
||||||||
if err := c.conn.SetWriteDeadline(time.Now().Add(WriteWait)); err != nil { | ||||||||
return fmt.Errorf("failed to set the write deadline for ping: %w", err) | ||||||||
} | ||||||||
|
||||||||
c.dataProviders.Clear() | ||||||||
if err := c.conn.WriteMessage(websocket.PingMessage, nil); err != nil { | ||||||||
UlyanaAndrukhiv marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
return fmt.Errorf("failed to write ping message: %w", err) | ||||||||
} | ||||||||
|
||||||||
return nil | ||||||||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why less? intuitively I would have thought it would need to be larger. Can you elaborate more in this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess Ulyana took it from here https://github.com/gorilla/websocket/blob/v1.5.3/examples/chat/client.go#L23.
I believe it's because ping and pong share the same timer.
Let’s consider a case where
pongWait
is smaller thanpingPeriod
, and we’ll see why this configuration is problematic.Parameters:
pongWait
= 30spingPeriod
= 40sAt t=0:
The server sends a ping message to the client.
At t=30s:
The
pongWait
expires because the server hasn't received a pong (or any message) from the client.The server assumes the connection is dead and closes it.
At t=40s:
The server sends its second ping, but the connection is already closed due to the timeout at t=30s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, but in that case, the server should have cleaned up the ping service when the connection was closed, so the second ping would never happen