tx/group compaction fixes #24637

bharathv · 2024-12-21T06:30:35Z

Check individual commit message for details

Main changes

Adds a lots of observability
- Now consumer offsets partitions are supported in describe_producers Kafka API
- Adds a v1/producers/kafka/topic/partition end point that gives detailed producer level debug info
Fixes a couple of compaction bugs with group transactions

Backports Required

Release Notes

Bug Fixes

Fixes an issue that blocked the compaction of consumer offsets with group transactions.

bharathv · 2024-12-21T06:33:34Z

/dt

vbotbuildovich · 2024-12-21T10:19:36Z

CI test results

test results on build#60037

test_id	test_kind	job_url	test_status	passed
test_consumer_group_recovery_rpunit.test_consumer_group_recovery_rpunit	unit	https://buildkite.com/redpanda/redpanda/builds/60037#0193e7ef-114e-474c-b463-9426fd2ed955	FAIL	0/2
test_consumer_group_recovery_rpunit.test_consumer_group_recovery_rpunit	unit	https://buildkite.com/redpanda/redpanda/builds/60037#0193e7ef-1152-4aea-a1e2-19d14ea9a00a	FAIL	0/2

test results on build#60049

test_id	test_kind	job_url	test_status	passed
kafka_server_rpfixture.kafka_server_rpfixture	unit	https://buildkite.com/redpanda/redpanda/builds/60049#0193f07e-533f-4715-b8a1-0f31c067b1da	FLAKY	1/2
rptest.tests.cloud_storage_scrubber_test.CloudStorageScrubberTest.test_scrubber.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60049#0193f0bf-dcc3-4557-af8c-d9940a72aff1	FAIL	0/1
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60049#0193f0bf-dcc1-4950-bb16-fd655ecc86af	FLAKY	3/6
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60049#0193f0bf-dcc2-4f4b-8cde-76741d798378	FLAKY	4/6
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60049#0193f0c5-171e-474d-aa8e-0d0ffb161139	FLAKY	1/6
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60049#0193f0c5-171f-4e2a-8beb-403808ca77b4	FLAKY	2/6
rptest.tests.e2e_topic_recovery_test.EndToEndTopicRecovery.test_restore_with_aborted_tx.recovery_overrides=.retention.local.target.bytes.1024.redpanda.remote.write.True.redpanda.remote.read.True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/60049#0193f0c5-171b-40a1-bfbf-37f5a6e814bf	FAIL	0/1
rptest.transactions.consumer_offsets_test.VerifyConsumerOffsetsThruUpgrades.test_consumer_group_offsets.versions_to_upgrade=3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60049#0193f0c5-1719-46be-8112-309c6520cae5	FAIL	0/1

test results on build#60058

test_id	test_kind	job_url	test_status	passed
rptest.tests.cloud_storage_scrubber_test.CloudStorageScrubberTest.test_scrubber.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60058#0193f31d-2f6d-4a69-8dc9-12c8e28584c6	FAIL	0/1
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_mount_inexistent	ducktape	https://buildkite.com/redpanda/redpanda/builds/60058#0193f337-48f5-4f5d-b1f2-e3b95d1ca948	FLAKY	5/6
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60058#0193f31d-2f6c-47ff-9829-30ea21edb404	FLAKY	2/6
rptest.transactions.producers_api_test.ProducersAdminAPITest.test_producers_state_api_during_load	ducktape	https://buildkite.com/redpanda/redpanda/builds/60058#0193f31d-2f6f-44e5-8895-92629bdc07fb	FAIL	0/1
rptest.transactions.producers_api_test.ProducersAdminAPITest.test_producers_state_api_during_load	ducktape	https://buildkite.com/redpanda/redpanda/builds/60058#0193f337-48f5-4f5d-b1f2-e3b95d1ca948	FAIL	0/1

bharathv · 2024-12-22T22:26:11Z

/ci-repeat 3

vbotbuildovich · 2024-12-23T01:18:16Z

Retry command for Build#60049

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/cloud_storage_scrubber_test.py::CloudStorageScrubberTest.test_scrubber@{"cloud_storage_type":1}
tests/rptest/transactions/consumer_offsets_test.py::VerifyConsumerOffsetsThruUpgrades.test_consumer_group_offsets@{"versions_to_upgrade":3}
tests/rptest/tests/e2e_topic_recovery_test.py::EndToEndTopicRecovery.test_restore_with_aborted_tx@{"cloud_storage_type":2,"recovery_overrides":{"redpanda.remote.read":true,"redpanda.remote.write":true,"retention.local.target.bytes":1024}}

A bug in 24.2.0 resulted in a situation where tx_fence batches were retained _after_ compaction while their corresponding data/commit/abort batches were compacted away. This applied to only group transactions that used tx_fence to begin the transaction. Historical context: tx_fence was considered historically as fence batches that begin a group transaction and regular data partition transaction. That changed starting 24.2.0 where a dedicated fence batch type (group_tx_fence) was used for group transaction fencing. After this buggy compaction, these uncleaned tx_fence batches are accounted as open transactions when computing max_collectible_offset thus blocking further compaction after upgrade to 24.2.x. We just ignore tx_fence batches going forward, the rationale is as follows. - Fistly they are not currently in use starting 24.2 (in favor of a dedicated group_tx_fence), anyone starting group transactions from 24.2 shouldn't see any conflicts - For sealed transactions, commit/abort/data batches were already removed if the compaction ran, so ignoring tx_fence should be the right thing to in such cases without any conflicts/correctness issues - Hypothetically if the compaction didn't run, it is still ok to ignore those batches because in group transactions commited transactions are atomically rewritten as a separate raft_data batch along with commit marker which will be applied in the stm (so no state will be lost) - Any group transaction using tx_fence likely belonged to 24.1.x which is atleast 6 months old at the time of writing, so reasonable to assume all such transactions are already sealed, especially since we started enforcing max transaction timeout of 15mins. - The only case where it could theoretically be a problem is during an upgrade from 24.1.x with an open transaction upgrading to 24.2.x (with the fix) and the transaction remaining open throughout the upgrade which then be considered aborted (if leadership is assumed on a 24.2.x broker). This is a highly unlikey scenario but the suggestion is to stop all running group transactions (kstreams) applications when doing the upgrade note: this only affects group transactions, so regular transactions that donot do offset commits as a part of transactions are safe.

This will result in hanging transactions and subsequent blocking of compaction.

.. for a given partition, to be hooked up with REST API in the next commit.

/v1/debug/producers/{namespace}/{topic}/{partition} .. includes low level debug information about producers for idempotency/transactional state.

mmaslankaprv · 2024-12-23T11:06:14Z

src/v/kafka/server/group_manager.h

@@ -185,6 +186,8 @@ class group_manager {

    described_group describe_group(const model::ntp&, const kafka::group_id&);

+    partition_response describe_partition_producers(const model::ntp&);


nit: partition_response is confusing as it has no context, can we create an alias for this type f.e. partition_producers ?

mmaslankaprv · 2024-12-23T11:07:21Z

src/v/kafka/server/group_manager.cc

+    }
+    response.error_code = kafka::error_code::none;
+    // snapshot the list of groups attached to this partition
+    fragmented_vector<std::pair<group_id, group_ptr>> groups;


chunked_vector ?

mmaslankaprv · 2024-12-23T11:10:15Z

src/v/kafka/server/group_tx_tracker_stm.h

-
-        absl::btree_map<model::producer_identity, model::offset>
-          producer_to_begin;
+    const all_txs_t& inflight_transactions() const { return _all_txs; }


This part seems not relevant for current commit ?

mmaslankaprv · 2024-12-23T11:14:43Z

src/v/kafka/server/group_tx_tracker_stm.cc

+                      "[{}] group: {}, producer: {}, begin: {}",
+                      _raft->ntp(),


nit: can we add more context to this log line ? i.e. it is not known what the begin is

mmaslankaprv · 2024-12-23T11:16:17Z

src/v/redpanda/admin/transaction.cc

+
+    if (pid_str.empty() || epoch_str.empty() || sequence_str.empty()) {
+        throw ss::httpd::bad_param_exception(
+          "invalid producer_id/epoch, should be >= 0");


nit: the error message is misaligned

vbotbuildovich · 2024-12-23T12:14:38Z

Retry command for Build#60058

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/transactions/producers_api_test.py::ProducersAdminAPITest.test_producers_state_api_during_load
tests/rptest/tests/cloud_storage_scrubber_test.py::CloudStorageScrubberTest.test_scrubber@{"cloud_storage_type":1}

nvartolomei · 2024-12-23T13:28:01Z

src/v/kafka/server/group_manager.cc

+            auto& tx = state.transaction;
+            int64_t start_offset = -1;
+            if (tx && tx->begin_offset >= model::offset{0}) {
+                start_offset = partition->get_offset_translator_state()


is it possible that tx->begin_offset doesn't exist anymore/removed by retention?

nvartolomei

Skimmed the PR to be up-to-date with the changes. Hope the observations are useful.

nvartolomei · 2024-12-23T13:32:18Z

src/v/cluster/tx_gateway_frontend.cc

@@ -2974,4 +2974,23 @@ ss::future<tx::errc> tx_gateway_frontend::do_delete_partition_from_tx(
    co_return tx::errc::none;
 }

+ss::future<tx::errc> tx_gateway_frontend::unsafe_abort_group_transaction(


can you add a comment describing what is "unsafe" about this operation? what invariants will break? what semantic behavior breaks?

nvartolomei · 2024-12-23T13:33:48Z

src/v/redpanda/admin/api-doc/transaction.json

+                        },
+                        {
+                            "name": "sequence",
+                            "in": "int",


typo? should be "in": "query",

nvartolomei · 2024-12-23T13:39:24Z

src/v/redpanda/admin/transaction.cc

+    }
+
+    if (pid_str.empty() || epoch_str.empty() || sequence_str.empty()) {
+        throw ss::httpd::bad_param_exception(


This should happen out of the box btw. This seems to be called implicitly when handling requests https://github.com/redpanda-data/seastar/blob/09a59a23ff2740a2fa591b0e65d978ca83d2b9e3/include/seastar/http/handlers.hh#L76

nvartolomei · 2024-12-23T13:40:34Z

src/v/redpanda/admin/transaction.cc

+    auto sequence_str = request->get_query_param("sequence");
+
+    if (group_id.empty()) {
+        throw ss::httpd::bad_param_exception("group_id cannot be empty");


Can this actually happen or it is guaranteed by the router that this is not empty? nit: Anyway, the right error here would be 404.

nvartolomei · 2024-12-23T13:42:48Z

tests/rptest/services/admin.py

@@ -1849,3 +1849,15 @@ def get_debug_bundle_file(self, filename: str, node: MaybeNode = None):
    def delete_debug_bundle_file(self, filename: str, node: MaybeNode = None):
        path = f"debug/bundle/file/{filename}"
        return self._request("DELETE", path, node=node)
+
+    def unsafe_abort_group_transaction(self, group_id: str, pid: int,


nit: def unsafe_abort_group_transaction(self, group_id: str, *, pid: int, to force the caller to specify the param names. too many ints that are easy to reorder

nvartolomei · 2024-12-23T14:04:56Z

src/v/kafka/server/group_data_parser.h

+              dedicated fence batch type (group_tx_fence) was used for group
+              transaction fencing.
+
+              After this buggy compaction, these uncleaned tx_fence batches are


Is the buggy compaction fixed now? Otherwise. shouldn't we just fix the buggy compaction instead?

nvartolomei · 2024-12-23T14:10:14Z

src/v/kafka/server/group_tx_tracker_stm.cc

-      model::producer_identity{header.producer_id, header.producer_epoch},
-      header.base_offset);
+  model::record_batch_header, kafka::group_tx::offsets_metadata) {
+    // Transaction boundaries are determined by fence/commit or abort


This change doesn't seem to be explained by the commit message?

ztlpn · 2024-12-23T14:47:18Z

src/v/redpanda/admin/api-doc/transaction.json

@@ -92,6 +92,46 @@
                    ]
                }
            ]
+        },
+        {
+            "path": "/v1/transaction/{group_id}/unsafe_abort_group_transaction",


Nit: this looks a bit inconsistent, because group_id can be easily confused with transactional_id (when comparing with other endpoints)

ztlpn · 2024-12-23T14:49:42Z

src/v/redpanda/admin/debug.cc

+      cluster::get_producers_request{ntp, timeout});
+    if (result.error_code != cluster::tx::errc::none) {
+        throw ss::httpd::server_error_exception(fmt::format(
+          "Error {} processing partition state for ntp: {}",


nit: "processing partition state" sounds a bit ambiguous, maybe "getting producers for ntp:" instead?

ztlpn · 2024-12-23T15:12:54Z

src/v/redpanda/admin/debug.cc

+    const model::ntp ntp = parse_ntp_from_request(req->param);
+    auto timeout = std::chrono::duration_cast<model::timeout_clock::duration>(
+      10s);
+    auto result = co_await _tx_gateway_frontend.local().get_producers(


should we check here if the frontend is initialized (as is done for the other endpoint)?

ztlpn · 2024-12-23T15:15:50Z

src/v/redpanda/admin/transaction.cc

+        pid = model::producer_id{parsed_pid};
+    } catch (const boost::bad_lexical_cast& e) {
+        throw ss::httpd::bad_param_exception(
+          fmt::format("invalid producer_id, should be >= 0: {}", e));


nit: I guess printing values should be more useful than printing the exception?

ztlpn · 2024-12-23T15:16:17Z

src/v/redpanda/admin/transaction.cc

+    auto group_ntp = mapper.ntp_for(kafka::group_id{group_id});
+    if (!group_ntp) {
+        throw ss::httpd::server_error_exception(
+          "consumer_offsets topic now found");


typo now -> not

ztlpn · 2024-12-23T15:20:32Z

src/v/redpanda/admin/debug.cc

+        producers.producers.push(std::move(producer_state));
+        co_await ss::coroutine::maybe_yield();
+    }
+    co_return ss::json::json_return_type(std::move(producers));


ss::json::stream_range_as_array?

ztlpn · 2024-12-23T15:28:10Z

src/v/cluster/producer_state.h

+        return _inflight_requests;
+    }
+
+    const request_queue& fnished_requests() const { return _finished_requests; }


typo: fnished -> finished

github-actions bot added the area/redpanda label Dec 21, 2024

bharathv force-pushed the fix_co branch from 40803b8 to 546bac1 Compare December 22, 2024 22:25

bharathv force-pushed the fix_co branch from 546bac1 to 2c28789 Compare December 23, 2024 09:25

bharathv marked this pull request as ready for review December 23, 2024 09:29

bharathv requested a review from a team as a code owner December 23, 2024 09:29

bharathv added 12 commits December 23, 2024 01:29

tx/group: track begin offset of transactions

03c425e

tx/group: support describe_producers for group

5e8358a

tx/tests/dt: test for describe producers

00df369

group/stm: additional trace logging

b80c4d9

tx/groups: escape hatch for unsafe aborting of group transactions

e247e1f

k/group: disallow group deletion while transactions in progress

4f292ff

This will result in hanging transactions and subsequent blocking of compaction.

group/tx: add a ducktape test for compactibility of consumer_offsets

15a0d0a

tx/producer_state: add getters for internal state

796ac17

tx/observability: add types and plumbing needed to get producer states

c1c904d

.. for a given partition, to be hooked up with REST API in the next commit.

tx/admin: types for exposing producer info in REST api

e5b9b46

tx/observability: REST endpoint to fetch all producers from a partition

10ea2db

/v1/debug/producers/{namespace}/{topic}/{partition} .. includes low level debug information about producers for idempotency/transactional state.

bharathv force-pushed the fix_co branch from 2c28789 to 10ea2db Compare December 23, 2024 09:29

bharathv requested review from bashtanov, ztlpn and mmaslankaprv December 23, 2024 09:33

mmaslankaprv reviewed Dec 23, 2024

View reviewed changes

nvartolomei reviewed Dec 23, 2024

View reviewed changes

ztlpn reviewed Dec 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tx/group compaction fixes #24637

tx/group compaction fixes #24637

bharathv commented Dec 21, 2024 •

edited

Loading

bharathv commented Dec 21, 2024

vbotbuildovich commented Dec 21, 2024 •

edited

Loading

bharathv commented Dec 22, 2024

vbotbuildovich commented Dec 23, 2024 •

edited

Loading

mmaslankaprv Dec 23, 2024

mmaslankaprv Dec 23, 2024

mmaslankaprv Dec 23, 2024

mmaslankaprv Dec 23, 2024

mmaslankaprv Dec 23, 2024

vbotbuildovich commented Dec 23, 2024 •

edited

Loading

nvartolomei Dec 23, 2024

nvartolomei left a comment

nvartolomei Dec 23, 2024

nvartolomei Dec 23, 2024

nvartolomei Dec 23, 2024

nvartolomei Dec 23, 2024

nvartolomei Dec 23, 2024

nvartolomei Dec 23, 2024

nvartolomei Dec 23, 2024

ztlpn Dec 23, 2024

ztlpn Dec 23, 2024

ztlpn Dec 23, 2024

ztlpn Dec 23, 2024

ztlpn Dec 23, 2024

ztlpn Dec 23, 2024

ztlpn Dec 23, 2024

		@@ -185,6 +186,8 @@ class group_manager {

		described_group describe_group(const model::ntp&, const kafka::group_id&);

		partition_response describe_partition_producers(const model::ntp&);

tx/group compaction fixes #24637

Are you sure you want to change the base?

tx/group compaction fixes #24637

Conversation

bharathv commented Dec 21, 2024 • edited Loading

Backports Required

Release Notes

Bug Fixes

bharathv commented Dec 21, 2024

vbotbuildovich commented Dec 21, 2024 • edited Loading

CI test results

bharathv commented Dec 22, 2024

vbotbuildovich commented Dec 23, 2024 • edited Loading

Retry command for Build#60049

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vbotbuildovich commented Dec 23, 2024 • edited Loading

Retry command for Build#60058

Choose a reason for hiding this comment

nvartolomei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bharathv commented Dec 21, 2024 •

edited

Loading

vbotbuildovich commented Dec 21, 2024 •

edited

Loading

vbotbuildovich commented Dec 23, 2024 •

edited

Loading

vbotbuildovich commented Dec 23, 2024 •

edited

Loading