You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem reproduced on 29th and process-identifier-assignment+process-identifier-status group was failing until transactional_id_expiration_ms was reduced to 5 seconds and then increased again.
node-3 had for over a day errors of failing log messages with tx::errc::not_coordinator, even when transaction list showed that it should be the coordinator; but system remained to work, I assume because the application kept on running and coordinator remained on another broker.
node-1 / broker 0
Nov 29 09:03:36 demo-redpanda-1 rpk[471715]: INFO 2024-11-29 09:03:36,185 [shard 2:main] storage - disk_log_impl.cc:2475 - Removing "/var/lib/redpanda/data/kafka/cache-assignment-pending-demo/10_5913/19-10-v1.log" (remove_prefix_full_segments, {offset_tracker:{term:10, base_offset:19, committed_offset:26, dirty_offset:26}, compacted_segment=0, finished_self_compaction=0, finished_windowed_compaction=0, generation=17, reader={/var/lib/redpanda/data/kafka/cache-assignment-pending-demo/10_5913/19-10-v1.log, (2373 bytes)}, writer=nullptr, cache={cache_size=8, dirty tracker: {min: -9223372036854775808, max: -9223372036854775808}}, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/kafka/cache-assignment-pending-demo/10_5913/19-10-v1.base_index, offsets:19, index:{header_bitflags:0, base_offset:19, max_offset:26, base_timestamp:{timestamp: 1732179784596}, max_timestamp:{timestamp: 1732266206467}, batch_timestamps_are_monotonic:1, with_offset:true, non_data_timestamps:0, broker_timestamp:{{timestamp: 1732266206476}}, num_compactible_records_appended:{7}, index(1,1,1)}, step:32768, needs_persistence:0}})
Nov 29 09:08:53 demo-redpanda-1 rpk[471715]: WARN 2024-11-29 09:08:53,785 [shard 0:main] tx - rm_partition_frontend.cc:164 - got error rpc::errc:4 on remote begin tx
Nov 29 09:08:54 demo-redpanda-1 rpk[471715]: WARN 2024-11-29 09:08:54,106 [shard 0:main] tx - rm_partition_frontend.cc:164 - got error rpc::errc:4 on remote begin tx
Nov 29 09:08:54 demo-redpanda-1 rpk[471715]: WARN 2024-11-29 09:08:54,427 [shard 0:main] tx - rm_partition_frontend.cc:164 - got error rpc::errc:4 on remote begin tx
Nov 29 09:08:54 demo-redpanda-1 rpk[471715]: WARN 2024-11-29 09:08:54,747 [shard 0:main] tx - rm_partition_frontend.cc:164 - got error rpc::errc:4 on remote begin tx
Nov 29 09:08:55 demo-redpanda-1 rpk[471715]: WARN 2024-11-29 09:08:55,068 [shard 0:main] tx - rm_partition_frontend.cc:164 - got error rpc::errc:4 on remote begin tx
Nov 29 09:08:55 demo-redpanda-1 rpk[471715]: WARN 2024-11-29 09:08:55,388 [shard 0:main] tx - rm_partition_frontend.cc:164 - got error rpc::errc:4 on remote begin tx
Nov 29 09:08:55 demo-redpanda-1 rpk[471715]: WARN 2024-11-29 09:08:55,709 [shard 0:main] tx - rm_partition_frontend.cc:164 - got error rpc::errc:4 on remote begin tx
node-2 / broker 1
Nov 29 04:40:06 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:40:06,975 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=142} sequence: 0
Nov 29 04:40:06 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:40:06,976 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=142} sequence: 0
Nov 29 04:40:06 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:40:06,976 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=142} sequence: 0
Nov 29 04:40:06 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:40:06,977 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=142} sequence: 0
Nov 29 04:40:06 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:40:06,977 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=142} sequence: 0
Nov 29 04:40:07 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:40:07,923 [shard 1:main] tx - tx_gateway_frontend.cc:2497 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] producer {producer_identity: id=13020, epoch=142} is fenced of by {producer_identity: id=13020, epoch=143}
Nov 29 04:41:08 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:41:08,976 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=144} sequence: 0
Nov 29 04:41:08 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:41:08,976 [shard 1:main] tx - tx_gateway_frontend.cc:578 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] found transaction {id: demo@demo-core-2@core@default+incidents+Augmentation, status: ongoing, pid: {producer_identity: id=13020, epoch=144}, last_pid: {producer_identity: id=-1,
epoch=-1}, etag: 37, seq: 0, partitions: } to abort
Nov 29 04:41:08 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:41:08,995 [shard 1:main] tx - tx_gateway_frontend.cc:2497 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] producer {producer_identity: id=13020, epoch=144} is fenced of by {producer_identity: id=13020, epoch=145}
Nov 29 04:41:09 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:41:09,175 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=144} sequence: 0
Nov 29 04:41:09 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:41:09,176 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=144} sequence: 0
Nov 29 04:41:09 demo-repanda-2 rpk[516933]: message repeated 2 times: [ INFO 2024-11-29 04:41:09,176 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=144} sequence: 0]
Nov 29 04:41:09 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:41:09,178 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=144} sequence: 0
Nov 29 04:41:10 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:41:10,117 [shard 1:main] tx - tx_gateway_frontend.cc:2497 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] producer {producer_identity: id=13020, epoch=144} is fenced of by {producer_identity: id=13020, epoch=145}
Nov 29 04:42:11 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:42:11,177 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=146} sequence: 0
Nov 29 04:42:11 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:42:11,177 [shard 1:main] tx - tx_gateway_frontend.cc:578 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] found transaction {id: demo@demo-core-2@core@default+incidents+Augmentation, status: ongoing, pid: {producer_identity: id=13020, epoch=146}, last_pid: {producer_identity: id=-1,
epoch=-1}, etag: 37, seq: 0, partitions: } to abort
Nov 29 04:42:11 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:42:11,191 [shard 1:main] tx - tx_gateway_frontend.cc:2497 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] producer {producer_identity: id=13020, epoch=146} is fenced of by {producer_identity: id=13020, epoch=147}
Nov 29 04:42:11 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:42:11,373 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=146} sequence: 0
Nov 29 04:42:11 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:42:11,373 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=146} sequence: 0
Nov 29 04:42:11 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:42:11,374 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=146} sequence: 0
Nov 29 04:42:11 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:42:11,374 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=146} sequence: 0
Nov 29 04:42:11 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:42:11,375 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=146} sequence: 0
Nov 29 04:42:11 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:42:11,375 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=146} sequence: 0
Nov 29 04:42:12 demo-repanda-2 rpk[516933]: INFO 2024-11-29 04:42:12,310 [shard 1:main] tx - tx_gateway_frontend.cc:2497 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] producer {producer_identity: id=13020, epoch=146} is fenced of by {producer_identity: id=13020, epoch=147}
[...]
Nov 29 11:00:11 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:00:11,535 [shard 1:main] kafka - server.cc:167 - Disconnected 10.200.118.1:59833 (applying protocol, Connection reset by peer)
Nov 29 11:00:11 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:00:11,535 [shard 2:main] kafka - server.cc:167 - Disconnected 10.200.118.1:59840 (applying protocol, Connection reset by peer)
Nov 29 11:02:34 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:02:34,644 [shard 2:main] storage - disk_log_impl.cc:2475 - Removing "/var/lib/redpanda/data/kafka/cache-assignment-pending-demo/9_5913/35-10-v1.log" (remove_prefix_full_segments, {offset_tracker:{term:10, base_offset:35, committed_offset:36, dirty_offset:36}, compacted_segment=0, finished_self_compaction=0, finished_windowed_compaction=0, generation=5, reader={/var/lib/redpanda/data/kafka/cache-assignment-pending-demo/9_5913/35-10-v1.log, (466 bytes)}, writer=nullptr, cache={cache_size=0, dirty tracker: {min: -9223372036854775808, max: -9223372036854775808}}, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/kafka/cache-assignment-pending-demo/9_5913/35-10-v1.base_index, offsets:35, index:{header_bitflags:0, base_offset:35, max_offset:36, base_timestamp:{timestamp: 1732273351093}, max_timestamp:{timestamp: 1732273351093}, batch_timestamps_are_monotonic:1, with_offset:true, non_data_timestamps:0, broker_timestamp:{{timestamp: 1732273351108}}, num_compactible_records_appended:{1}, index(1,1,1)}, step:32768, needs_persistence:0}})
Nov 29 11:05:45 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:05:45,444 [shard 3:main] storage-gc - segment.cc:631 - Marking compaction index /var/lib/redpanda/data/kafka/__consumer_offsets/2_19/90331319-33-v1.compaction_index as incomplete
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,218 [shard 0:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45350 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,219 [shard 2:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45248 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,219 [shard 3:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45260 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,219 [shard 0:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45214 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,219 [shard 3:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45208 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,219 [shard 0:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45200 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,219 [shard 0:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45186 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,219 [shard 3:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45166 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,219 [shard 3:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45154 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,219 [shard 3:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45144 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,220 [shard 2:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45104 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,220 [shard 0:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45032 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,220 [shard 2:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45076 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,220 [shard 1:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45018 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,220 [shard 3:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45134 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,220 [shard 2:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45008 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,220 [shard 1:main] kafka - server.cc:167 - Disconnected 10.200.118.1:45002 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,220 [shard 1:main] kafka - server.cc:167 - Disconnected 10.200.118.1:44960 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,220 [shard 0:main] kafka - server.cc:167 - Disconnected 10.200.118.1:44990 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,220 [shard 3:main] kafka - server.cc:167 - Disconnected 10.200.118.1:44936 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,220 [shard 1:main] kafka - server.cc:167 - Disconnected 10.200.118.1:44934 (applying protocol, Connection reset by peer)
Nov 29 11:15:57 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:15:57,220 [shard 2:main] kafka - server.cc:167 - Disconnected 10.200.118.1:60214 (applying protocol, Connection reset by peer)
Nov 29 11:16:12 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:16:12,364 [shard 2:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-1@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=13010, epoch=91} sequence: 1
Nov 29 11:16:12 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:16:12,364 [shard 2:main] tx - tx_gateway_frontend.cc:578 - [tx_id=demo@demo-core-1@core@default+metadata-entry+Augmentation] found transaction {id: demo@demo-core-1@core@default+metadata-entry+Augmentation, status: ongoing, pid: {producer_identity: id=13010, epoch=91}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 39, seq: 1, partitions: } to abort
Nov 29 11:16:12 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:16:12,464 [shard 2:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-1@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=13010, epoch=91} sequence: 1
Nov 29 11:16:16 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:16:16,902 [shard 1:main] kafka - server.cc:167 - Disconnected 10.200.118.1:60980 (applying protocol, Connection reset by peer)
Nov 29 11:16:16 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:16:16,902 [shard 1:main] kafka - server.cc:167 - Disconnected 10.200.118.1:60943 (applying protocol, Connection reset by peer)
Nov 29 11:16:16 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:16:16,902 [shard 2:main] kafka - server.cc:167 - Disconnected 10.200.118.1:60946 (applying protocol, Connection reset by peer)
Nov 29 11:17:03 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:17:03,680 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=148} sequence: 0
Nov 29 11:17:03 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:17:03,680 [shard 1:main] tx - tx_gateway_frontend.cc:578 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] found transaction {id: demo@demo-core-2@core@default+incidents+Augmentation, status: ongoing, pid: {producer_identity: id=13020, epoch=148}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 37, seq: 0, partitions: } to abort
Nov 29 11:17:03 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:17:03,689 [shard 1:main] tx - tx_gateway_frontend.cc:2497 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] producer {producer_identity: id=13020, epoch=148} is fenced of by {producer_identity: id=13020, epoch=149}
Nov 29 11:17:03 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:17:03,866 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=148} sequence: 0
Nov 29 11:17:03 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:17:03,867 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=148} sequence: 0
Nov 29 11:17:03 demo-repanda-2 rpk[516933]: message repeated 4 times: [ INFO 2024-11-29 11:17:03,867 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=148} sequence: 0]
Nov 29 11:17:03 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:17:03,868 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=148} sequence: 0
Nov 29 11:17:03 demo-repanda-2 rpk[516933]: message repeated 3 times: [ INFO 2024-11-29 11:17:03,868 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=148} sequence: 0]
Nov 29 11:17:03 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:17:03,877 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=148} sequence: 0
Nov 29 11:17:03 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:17:03,878 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=148} sequence: 0
Nov 29 11:17:03 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:17:03,878 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=148} sequence: 0
Nov 29 11:17:03 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:17:03,879 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=148} sequence: 0
Nov 29 11:17:04 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:17:04,782 [shard 1:main] tx - tx_gateway_frontend.cc:2497 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] producer {producer_identity: id=13020, epoch=148} is fenced of by {producer_identity: id=13020, epoch=149}
Nov 29 11:18:05 demo-repanda-2 rpk[516933]: INFO 2024-11-29 11:18:05,871 [shard 1:main] tx - tx_gateway_frontend.cc:513 - [tx_id=demo@demo-core-2@core@default+incidents+Augmentation] trying to abort transaction. pid: {producer_identity: id=13020, epoch=150} sequence: 0
node-3 / broker 2
Nov 27 02:44:45 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:45,756 [shard 3:main] rpc - transport.cc:214 - RPC timeout (3000 ms) to {host: 10.200.230.21, port: 33145}, method: raftgen::heartbeat_v2, correlation id: 5524273, 10 in flight, time since: {init: 3000 ms, enqueue: 3000 ms, memory_reserved: 3000 ms dispatch: 3000 ms, written: unset}, flushed:
false
Nov 27 02:44:45 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:45,833 [shard 0:main] rpc - transport.cc:214 - RPC timeout (100 ms) to {host: 10.200.230.21, port: 33145}, method: node_status_rpc::node_status, correlation id: 5009658, 1 in flight, time since: {init: 100 ms, enqueue: 100 ms, memory_reserved: 100 ms dispatch: 100 ms, written: unset}, flushed
: false
Nov 27 02:44:45 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:45,846 [shard 2:main] rpc - transport.cc:214 - RPC timeout (3000 ms) to {host: 10.200.230.21, port: 33145}, method: raftgen::heartbeat_v2, correlation id: 3818598, 10 in flight, time since: {init: 3000 ms, enqueue: 3000 ms, memory_reserved: 3000 ms dispatch: 291 ms, written: unset}, flushed:
false
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,034 [shard 0:main] rpc - transport.cc:214 - RPC timeout (100 ms) to {host: 10.200.230.21, port: 33145}, method: node_status_rpc::node_status, correlation id: 5009659, 1 in flight, time since: {init: 100 ms, enqueue: 100 ms, memory_reserved: 100 ms dispatch: 100 ms, written: unset}, flushed
: false
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,034 [shard 1:main] rpc - transport.cc:214 - RPC timeout (3000 ms) to {host: 10.200.230.21, port: 33145}, method: raftgen::heartbeat_v2, correlation id: 3980079, 10 in flight, time since: {init: 3000 ms, enqueue: 3000 ms, memory_reserved: 3000 ms dispatch: 299 ms, written: unset}, flushed:
false
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,054 [shard 0:main] rpc - transport.cc:214 - RPC timeout (3000 ms) to {host: 10.200.230.21, port: 33145}, method: raftgen::heartbeat_v2, correlation id: 3983955, 11 in flight, time since: {init: 3000 ms, enqueue: 3000 ms, memory_reserved: 3000 ms dispatch: 599 ms, written: unset}, flushed:
false
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,056 [shard 3:main] rpc - transport.cc:214 - RPC timeout (3000 ms) to {host: 10.200.230.21, port: 33145}, method: raftgen::heartbeat_v2, correlation id: 5524274, 10 in flight, time since: {init: 3000 ms, enqueue: 3000 ms, memory_reserved: 3000 ms dispatch: 3000 ms, written: unset}, flushed:
false
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,147 [shard 2:main] rpc - transport.cc:214 - RPC timeout (3000 ms) to {host: 10.200.230.21, port: 33145}, method: raftgen::heartbeat_v2, correlation id: 3818599, 10 in flight, time since: {init: 3000 ms, enqueue: 3000 ms, memory_reserved: 3000 ms dispatch: 291 ms, written: unset}, flushed:
false
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,234 [shard 0:main] rpc - transport.cc:214 - RPC timeout (100 ms) to {host: 10.200.230.21, port: 33145}, method: node_status_rpc::node_status, correlation id: 5009660, 1 in flight, time since: {init: 100 ms, enqueue: 100 ms, memory_reserved: 100 ms dispatch: 100 ms, written: unset}, flushed
: false
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,334 [shard 1:main] rpc - transport.cc:214 - RPC timeout (3000 ms) to {host: 10.200.230.21, port: 33145}, method: raftgen::heartbeat_v2, correlation id: 3980080, 10 in flight, time since: {init: 3000 ms, enqueue: 3000 ms, memory_reserved: 3000 ms dispatch: 299 ms, written: unset}, flushed:
false
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,354 [shard 0:main] rpc - transport.cc:214 - RPC timeout (3000 ms) to {host: 10.200.230.21, port: 33145}, method: raftgen::heartbeat_v2, correlation id: 3983956, 11 in flight, time since: {init: 3000 ms, enqueue: 3000 ms, memory_reserved: 3000 ms dispatch: 599 ms, written: unset}, flushed:
false
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,356 [shard 3:main] rpc - transport.cc:214 - RPC timeout (3000 ms) to {host: 10.200.230.21, port: 33145}, method: raftgen::heartbeat_v2, correlation id: 5524275, 10 in flight, time since: {init: 3000 ms, enqueue: 3000 ms, memory_reserved: 3000 ms dispatch: 3000 ms, written: unset}, flushed:
false
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,435 [shard 0:main] rpc - transport.cc:214 - RPC timeout (100 ms) to {host: 10.200.230.21, port: 33145}, method: node_status_rpc::node_status, correlation id: 5009661, 1 in flight, time since: {init: 100 ms, enqueue: 100 ms, memory_reserved: 100 ms dispatch: 100 ms, written: unset}, flushed
: false
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,448 [shard 2:main] rpc - transport.cc:214 - RPC timeout (3000 ms) to {host: 10.200.230.21, port: 33145}, method: raftgen::heartbeat_v2, correlation id: 3818600, 10 in flight, time since: {init: 3000 ms, enqueue: 3000 ms, memory_reserved: 3000 ms dispatch: 291 ms, written: unset}, flushed:
false
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,534 [shard 3:main] tx - [N:process-identifier-assignment S:Empty G:173] group.cc:3181 - attempting expiration of producer: {producer_identity: id=2005, epoch=0} transaction
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: WARN 2024-11-27 02:44:46,534 [shard 3:main] tx - [N:process-identifier-assignment S:Empty G:173] group.cc:3190 - producer {producer_identity: id=2005, epoch=0} transaction expiration result: tx::errc::not_coordinator
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,534 [shard 3:main] tx - [N:process-identifier-assignment S:Empty G:173] group.cc:3181 - attempting expiration of producer: {producer_identity: id=2035, epoch=0} transaction
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: WARN 2024-11-27 02:44:46,534 [shard 3:main] tx - [N:process-identifier-assignment S:Empty G:173] group.cc:3190 - producer {producer_identity: id=2035, epoch=0} transaction expiration result: tx::errc::not_coordinator
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,534 [shard 3:main] tx - [N:process-identifier-assignment S:Empty G:173] group.cc:3181 - attempting expiration of producer: {producer_identity: id=2048, epoch=2} transaction
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: WARN 2024-11-27 02:44:46,534 [shard 3:main] tx - [N:process-identifier-assignment S:Empty G:173] group.cc:3190 - producer {producer_identity: id=2048, epoch=2} transaction expiration result: tx::errc::not_coordinator
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,534 [shard 3:main] tx - [N:process-identifier-assignment S:Empty G:173] group.cc:3181 - attempting expiration of producer: {producer_identity: id=2084, epoch=0} transaction
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: WARN 2024-11-27 02:44:46,534 [shard 3:main] tx - [N:process-identifier-assignment S:Empty G:173] group.cc:3190 - producer {producer_identity: id=2084, epoch=0} transaction expiration result: tx::errc::not_coordinator
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,534 [shard 3:main] tx - [N:process-identifier-assignment S:Empty G:173] group.cc:3181 - attempting expiration of producer: {producer_identity: id=2105, epoch=0} transaction
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: WARN 2024-11-27 02:44:46,534 [shard 3:main] tx - [N:process-identifier-assignment S:Empty G:173] group.cc:3190 - producer {producer_identity: id=2105, epoch=0} transaction expiration result: tx::errc::not_coordinator
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,534 [shard 3:main] tx - [N:process-identifier-assignment S:Empty G:173] group.cc:3181 - attempting expiration of producer: {producer_identity: id=2131, epoch=0} transaction
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: WARN 2024-11-27 02:44:46,534 [shard 3:main] tx - [N:process-identifier-assignment S:Empty G:173] group.cc:3190 - producer {producer_identity: id=2131, epoch=0} transaction expiration result: tx::errc::not_coordinator
Nov 27 02:44:46 demo-redpanda-3 rpk[500671]: INFO 2024-11-27 02:44:46,534 [shard 3:main] tx - [N:process-identifier-assignment S:Empty G:173] group.cc:3181 - attempting expiration of producer: {producer_identity: id=2144, epoch=0} transaction
There where a lot of errors; number of errors per date / hour:
ubuntu@demo-redpanda-3:~$ cat /var/log/syslog.1 | grep "not_coordinator" | cut -b 1-9 | uniq -c
7900 Nov 25 02
12008 Nov 25 03
3951 Nov 27 02
12960 Nov 27 03
12265 Nov 27 04
11160 Nov 27 05
11160 Nov 27 06
11160 Nov 27 07
11160 Nov 27 08
11160 Nov 27 09
11160 Nov 27 10
11160 Nov 27 11
11160 Nov 27 12
11160 Nov 27 13
11160 Nov 27 14
11160 Nov 27 15
11160 Nov 27 16
11160 Nov 27 17
11160 Nov 27 18
11160 Nov 27 19
11160 Nov 27 20
11160 Nov 27 21
11160 Nov 27 22
11160 Nov 27 23
11160 Nov 28 00
11160 Nov 28 01
11160 Nov 28 02
11160 Nov 28 03
11129 Nov 28 04
11160 Nov 28 05
11160 Nov 28 06
11160 Nov 28 07
11160 Nov 28 08
11160 Nov 28 09
11160 Nov 28 10
11160 Nov 28 11
11160 Nov 28 12
11160 Nov 28 13
11160 Nov 28 14
11160 Nov 28 15
11160 Nov 28 16
11160 Nov 28 17
11160 Nov 28 18
11160 Nov 28 19
11160 Nov 28 20
11160 Nov 28 21
11160 Nov 28 22
11160 Nov 28 23
11160 Nov 29 00
11160 Nov 29 01
11160 Nov 29 02
11160 Nov 29 03
11160 Nov 29 04
11160 Nov 29 05
11160 Nov 29 06
11160 Nov 29 07
11160 Nov 29 08
11160 Nov 29 09
11160 Nov 29 10
1085 Nov 29 11
Version & Environment
OS: Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-1054-kvm x86_64)
What went wrong?
process-identifier-assignment+process-identifier-status
group was failing untiltransactional_id_expiration_ms
was reduced to 5 seconds and then increased again.tx::errc::not_coordinator
, even when transaction list showed that it should be the coordinator; but system remained to work, I assume because the application kept on running and coordinator remained on another broker.node-1 / broker 0
node-2 / broker 1
node-3 / broker 2
There where a lot of errors; number of errors per date / hour:
The text was updated successfully, but these errors were encountered: