-
-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster ha improvements #1769
base: main
Are you sure you want to change the base?
Cluster ha improvements #1769
Conversation
The message block "in transit" is buffered until acknowledged by the receiving node. In case the receiving node crashed, the buffer is sent again when the connection is re-established. In case the subscriber reconnected to another node, the resent messages are forwarded to that new node.
Messages from the queue that are published to the subscriber were saved in a backup queue and released (=deleted from persistent storage) before they were delivered/acknowledged by the client. Now the messages are only released when the acknowledge is received.
@MarcVanOevelen thanks... what a great effort! :) If you feel I need any more specific information to for an effective review, let me know. (I don't think so, for now at least). |
@ioolkos Thanks! It took me quite some time to get familiar with the codebase but I must say it is very well structured, Indeed such changes are very tricky given the many features and use-cases. Our use case is focused on a small number of publishers/subscribers but we really need I still have to assess the inevitable performance cost of those changes. The release smoketest fails now but I suppose this is caused by the rebar3_cuttlefish dependency I also see that a test is failing which did not happen when I ran them locally? If you need more info please let me know. |
%% Note that currently the backup is kept only in memory, a future improvement | ||
%% would be to persist the backup on disk to also protect against message loss | ||
%% in case the transmitting node crashes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should put this in a ticket @ioolkos so it doesn't get lost in the code comments
Remove excess blank lines Co-authored-by: Dairon Medina Caro <[email protected]>
Common Test suites and Dialyzer pass for me locally. |
@MarcVanOevelen no in-depth review yet, but I like your code/commenting style, very much aligned with existing code. It'd be great to have an idea on performance impact, as you already mentioned. We also need to protect any RAM-grabbing component (buffer/queue/backup) or at least ensure that it will hit an upstream config limit (haven't yet looked into he details of that point though). |
@MarcVanOevelen I'll try to run some basic performance comparison over the weekend. Anything specific you'd want me to test? |
I have successfully completed the HA behavior tests in our production system.
Since performance tests are very dependent on the test-environment (resources, ...) we can learn most from relative tests. |
@MarcVanOevelen great, thanks. Hopefully no power outages any more :) |
Did a basic test to get a feeling for latencies. This is a quick test with 50 publisher on node 1 to 1 subscriber on node 2, sending 1500 100 byte messages total per second (QoS 1). EDIT: I also noticed that CPU on both nodes stays very high after the test finishes, with 1769 branch. This needs to be investigated too. |
@MarcVanOevelen the loop in |
This avoids high cpu due to needless looping.
Nice catch! |
@ioolkos performance test comparison results in attached pdf |
@MarcVanOevelen thanks... can you leave me your e-mail address at contact email here: https://vernemq.com/community.html |
@ioolkos and here the HA test results |
Proposed Changes
These changes fix missing messages in QoS1 in a HA VerneMQ cluster
as reported in this issue:
Message loss on node fault with QoS1(#1700)
Following problems were discovered/addressed:
When the broker receives a message from a publisher it was written into the subscriber(s) queue(s)
using an asynchronous function call. This causes the ack to be sent before the message was written to persistent storage
hence leaving a small time-window in which a crash of the broker causes the acknowledged message to be lost.
Solved by using a synchronous call.
When the broker receives messages for a subscriber connected to another node, the messages are sent via the dedicated inter-node TCP connection. A crash of the receiving node while a message is present in the tcp receive buffer
and acknowledged to the sender but not yet processed by the broker will cause that message to be lost.
The publisher received an ack when the message was delivered to the senders tcp stack.
Solved by introducing an application-level handshake on the inter-node tcp connection:
During HA testing some some crashes were observed when queue migration was triggered
i.,e. rpc call to get queue pid from the crashed node and ets:lookup.
Solved by handling the exceptions.
When the broker delivers messages to a connected subscriber, the messages from the queue are saved in a
backup queue and "mailed" to the fsm process that performs the interaction with the client. (retries, acks, ...).
But the backup messages were released (=deleted from persistent storage) before they were actually acknowledged
by the subscriber.
Again, when the broker crashes those messages are lost.
Solved by :
Types of Changes
Checklist
CODE_OF_CONDUCT.md
documentFurther Comments
These fixes do not yet cover all cases of message loss but already drastically reduced their ocurrence:
Without the fixes about 1 in 3 HA testruns showed message loss.
Above fixes reduced this to about 1 lost message in 1000 testruns.