[Questions] Segments files of quorum are not clean #13586
Replies: 4 comments 15 replies
-
We may need to make some adjustments to the 4.0 snapshotting strategy for low throughput queues with very large messages. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your replay. I do many test can be that last one I send less number of message but result it's all time the same. To be more precise, full use case is a workflow between 5 queues and similar message passe by all queues, 3MB * 300 * 5 + WAL size, the disk become full .
Phelps documentation can be improve to explain when snapshot start to be clean, especially if it's only after a number of messages. Nothings mentioned on the performance-tuning-large-messages I will try to increase disk size from 8 to 16Gb x 3 node and see if segment will be cleanup |
Beta Was this translation helpful? Give feedback.
-
@YvesZelros there is a quorum queue guide section that explains what settings should be used to make QQs truncate segment files quicker. It is primarily relevant for workloads with large messages. Beyond that, making sure your consumers do not hold on to deliveries for minutes or hours is the change that helps the most. Segment files that have at least one message in Ready state will be retained. Segment files with fewer entries have a lower probability of containing just one or a few such messages. |
Beta Was this translation helpful? Give feedback.
-
This PR should make checkpointing more frequent for large message workloads. This PR will work best with a lower |
Beta Was this translation helpful? Give feedback.
-
Community Support Policy
RabbitMQ version used
4.0.7
Erlang version used
27.2.x
Operating system (distribution) used
rabbitmq:4.0.7-management
How is RabbitMQ deployed?
Community Docker image
rabbitmq-diagnostics status output
rabbitmq-diagnostics status
rabbitmq-diagnostics quorum_status --vhost xxxxxx segmentator-waiting-queue
rabbitmq.conf
See https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location
Steps to deploy RabbitMQ cluster
RabbitQm cluster operator.
Steps to reproduce the behavior in question
Send 200 messages of 3 mb to a quorum queue on a vhost different of the default.
advanced.config
No response
Application code
No response
Kubernetes deployment file
Cluster
Queue
Policy
What problem are you trying to solve?
Segment of quorum are not cleaned, disk size grows up to 8 Gb.
Checkpoints & snapshots directory are empty
Message size are between 2-4Mb and raft.segment_max_entries is set to 64.
Queue status
All segments still present including when we purge the queue and send one new message.
Question:
Content of ~/mnesia/rabbit@xxxxxx-rabbitmq-server-0.xxxxxx-rabbitmq-nodes.xxx-5ufiakh/quorum/rabbit@xxxxxx-rabbitmq-server-0.xxxxxx-rabbitmq-nodes.xxx-5ufiakh/XXXXXZ4YGMB40JV80
Content of queue config
{id =>
{xxxxx_other,
'rabbit@xxxxxx-rabbitmq-server-0.xxxxxx-rabbitmq-nodes.xxx-5ufiakh'},
machine =>
{module,rabbit_fifo,
#{name => xxxxx_other,max_length => undefined,
max_bytes => undefined,
queue_resource => {resource,<<"xxxxxx">>,queue,<<"other">>},
created => 1742512049028,dead_letter_handler => undefined,
become_leader_handler =>
{rabbit_quorum_queue,become_leader,
[{resource,<<"xxxxxx">>,queue,<<"other">>}]},
overflow_strategy => reject_publish,delivery_limit => 20,
expires => undefined,msg_ttl => undefined,
single_active_consumer_on => false}},
membership => voter,friendly_name => "queue 'other' in vhost 'xxxxxx'",
cluster_name => xxxxx_other,uid => <<"XXXXXZ4YGMB40JV80">>,
initial_members =>
[{xxxxx_other,
'rabbit@xxxxxx-rabbitmq-server-0.xxxxxx-rabbitmq-nodes.xxx-5ufiakh'},
{xxxxx_other,
'rabbit@xxxxxx-rabbitmq-server-2.xxxxxx-rabbitmq-nodes.xxx-5ufiakh'},
{xxxxx_other,
'rabbit@xxxxxx-rabbitmq-server-1.xxxxxx-rabbitmq-nodes.xxx-5ufiakh'}],
log_init_args =>
#{max_checkpoints => 3,min_checkpoint_interval => 8192,
snapshot_interval => 8192,uid => <<"XXXXXZ4YGMB40JV80">>},
metrics_key => {resource,<<"xxxxxx">>,queue,<<"other">>},
ra_event_formatter =>
{rabbit_quorum_queue,format_ra_event,
[{resource,<<"xxxxxx">>,queue,<<"other">>}]},
tick_timeout => 5000,broadcast_time => 100,
install_snap_rpc_timeout => 120000,await_condition_timeout => 30000}
Beta Was this translation helpful? Give feedback.
All reactions