[CORE-8160] `storage`: add chunked compaction routine #24423

WillemKauf · 2024-12-03T21:35:54Z

This PR deals with the case in which zero segments were indexed for a round of sliding window compaction. This can happen for segments with a large number of unique keys, and per the memory constraints imposed on our key-offset hash map by storage_compaction_key_map_memory (128MiB by default).

This (historically) has not come about often, and may also be naturally alleviated by deduplicating or partially indexing the problem segment in question during future rounds of compaction (provided there is a steady ingress rate to the partition, and that keys in the problem segment are present in newer segments in the log), but added here is a routine that can handle this corner case when it arises.

Instead of throwing and logging an error when zero segments are indexed, we will now fall back to a "chunked" compaction routine.

This implementation uses some of the current abstractions from the compaction utilities to perform several rounds (chunks) of sliding window compaction with a partially indexed map created from the un-indexed segment by reading it in a linear fashion.

This implementation is sub-optimal for a number of reasons- primarily, segment indexes are read and rewritten each time a round of chunked compaction is performed. These intermediate states are then used for the next round of chunked compaction.

In the future, there may be a more optimal way to perform these steps using less IO by holding more information in memory before flushing the final results to disk, instead of flushing every intermediate stage. However, this case in which chunked compaction is required has seemed to be infrequent enough that merely having the implementation is valuable.

Backports Required

Release Notes

Improvements

Adds a chunked compaction routine to local storage, which is used as a fallback in the case that we fail to index a single segment during sliding window compaction.

dotnwat · 2024-12-04T00:03:51Z

In the case that zero segments were indexed for a round of sliding window compaction, we will now fall back to a chunked compaction routine.

Can you explain what "chunked compaction" is? When would sliding window fail to index segments, and why do we care?

WillemKauf · 2024-12-04T03:55:11Z

Can you explain what "chunked compaction" is? When would sliding window fail to index segments, and why do we care?

Added more detail to cover letter to address these points.

andrwng

Pretty much looks good! No major complains about structure, just some naming suggestions. Nice work!

Also could probably use some ducktape testing, though IIRC you mentioned a separate PR for stress testing compaction

src/v/storage/compaction_reducers.cc

src/v/storage/compaction_reducers.h

src/v/storage/segment_deduplication_utils.h

src/v/storage/tests/compaction_e2e_test.cc

WillemKauf · 2024-12-10T04:14:45Z

Also could probably use some ducktape testing, though IIRC you mentioned a separate PR for stress testing compaction

That PR is merged, I'm going to parameterize it in order to test chunked compaction and assert on some added metrics.

Will have updates to this ~~tomorrow~~ soon (TM).

WillemKauf · 2024-12-11T14:14:36Z

src/v/storage/segment_deduplication_utils.cc

+    co_await map.reset();
+    auto read_holder = co_await seg->read_lock();
+    auto start_offset_inclusive = model::next_offset(last_indexed_offset);
+    auto rdr = internal::create_segment_full_reader(


Recreating this full segment reader for each round of chunked compaction is a bummer.

Not sure if we have any abstractions to get around this- log_reader::reset_config() gave me some hope that the segment's lease/lock could be reused, but it doesn't seem to allow us to reset with a start_offset lower than what has been currently read.

For context, we have to do this because in the chunked_compaction_reducer, once we fail to index an offset for a record in a batch, we break out of the loop and will have to re-read that batch in the next round using that offset as the start, inclusively.

Leaving comment to rehighlight this point, in case @andrwng or anyone else has any ideas or comments on the cost of this repeated operation/possible tools at our disposal here.

Leaving comment to rehighlight this point, in case @andrwng or anyone else has any ideas or comments on the cost of this repeated operation/possible tools at our disposal here.

well we have the readers cache, but i dunno if it is useful in this context.

since this code is new and not used often, i think we should favor simplicity, unless of course something would be worse than just 'not optimal'.

WillemKauf · 2024-12-11T21:03:54Z

Force push to:

Rebase to upstream/dev
Add a new condition to reset the compaction sliding window offset in chunked_sliding_window_compaction- very important.
Add cluster config setting storage_compaction_key_map_memory_override_for_tests.
Parameterize log_compaction_test.py in order to test chunked compaction.
Address code review comments by adding documentation and renaming some objects/functions.
Fix logic in key_offset_map::initialize().
Add chunked_compaction_runs metric to storage::probe.

vbotbuildovich · 2024-12-11T23:53:29Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/59620#0193b7c9-24ab-4e30-86c0-486e483807a5
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/59620#0193b7cd-8ca6-4b5d-8d0b-0d71e9db760c
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/59620#0193b880-3519-420e-bb6a-8698f16e9a7e

WillemKauf · 2024-12-12T13:46:01Z

/ci-repeat 5
release
skip-units
skip-redpanda-build
dt-repeat=100
tests/rptest/tests/log_compaction_test.py

src/v/config/configuration.cc

WillemKauf · 2024-12-12T18:01:22Z

/ci-repeat 5
release
skip-units
skip-redpanda-build
dt-repeat=100
tests/rptest/tests/log_compaction_test.py

vbotbuildovich · 2024-12-12T19:38:17Z

Retry command for Build#59673

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/log_compaction_test.py::LogCompactionTest.compaction_stress_test@{"cleanup_policy":"compact,delete","key_set_cardinality":100,"storage_compaction_key_map_memory_kb":131072}

WillemKauf · 2024-12-12T20:07:36Z

https://ci-artifacts.dev.vectorized.cloud/redpanda/59673/0193bc0b-0365-4203-802b-c969372ea7ac/vbuild/ducktape/results/final/report.html

raise RuntimeError( RuntimeError: KgoVerifierProducer-0-139757941539360 possible idempotency bug: ProduceStatus<103424 102400 1024 1 0 0 0 41112 8055.5/12348.5/20587>

time="2024-12-12T19:30:07Z" level=warning msg="Produced at unexpected offset 3508 (expected 2493) on partition 0"

Possibly bad interaction between partition movement and KgoVerifierProducer?

Seemingly unrelated to compaction changes.

vbotbuildovich · 2024-12-12T21:27:23Z

CI test results

test results on build#59673

test_id	test_kind	job_url	test_status	passed
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.delete.key_set_cardinality=100.storage_compaction_key_map_memory_kb=131072	ducktape	https://buildkite.com/redpanda/redpanda/builds/59673#0193bc0b-0365-4203-802b-c969372ea7ac	FLAKY	99/100

test results on build#59782

test_id	test_kind	job_url	test_status	passed
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=None.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/59782#0193caa2-8bfd-4e36-a47d-8909582cb230	FAIL	0/6
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/59782#0193caa2-8bfe-4aad-ab48-5c49355e8883	FLAKY	3/6

test results on build#60281

test_id	test_kind	job_url	test_status	passed
rptest.tests.archival_test.ArchivalTest.test_all_partitions_leadership_transfer.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/60281#01942e92-2894-4362-8c2b-78b171fbbf6c	FLAKY	5/6

test results on build#60366

test_id	test_kind	job_url	test_status	passed
rm_stm_tests_rpunit.rm_stm_tests_rpunit	unit	https://buildkite.com/redpanda/redpanda/builds/60366#01944251-4af6-47ee-9668-c08bca99fde8	FLAKY	1/2
rptest.tests.partition_reassignments_test.PartitionReassignmentsTest.test_reassignments_kafka_cli	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#019442ab-5efa-48d0-b625-d105d2cb0754	FLAKY	4/6
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c79-4e6b-8108-65f7221292b3	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c77-454b-81d5-9dc77f2e830e	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=True.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c79-4e6b-8108-65f7221292b3	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=True.with_tiered_storage=False.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c76-468a-a4a7-505c7f86ed9b	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=True.with_tiered_storage=False.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c77-454b-81d5-9dc77f2e830e	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=True.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c76-468a-a4a7-505c7f86ed9b	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=True.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c77-454b-81d5-9dc77f2e830e	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=False.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c76-468a-a4a7-505c7f86ed9b	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=False.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c77-454b-81d5-9dc77f2e830e	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c79-4e6b-8108-65f7221292b3	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c76-468a-a4a7-505c7f86ed9b	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c77-454b-81d5-9dc77f2e830e	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=True.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c79-4e6b-8108-65f7221292b3	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=False.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c76-468a-a4a7-505c7f86ed9b	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=False.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c77-454b-81d5-9dc77f2e830e	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c76-468a-a4a7-505c7f86ed9b	FAIL	0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c77-454b-81d5-9dc77f2e830e	FAIL	0/1

test results on build#60412

test_id	test_kind	job_url	test_status	passed
gtest_raft_rpunit.gtest_raft_rpunit	unit	https://buildkite.com/redpanda/redpanda/builds/60412#0194464a-8713-461f-9266-2b856bb3e64c	FLAKY	1/2
rm_stm_tests_rpunit.rm_stm_tests_rpunit	unit	https://buildkite.com/redpanda/redpanda/builds/60412#0194464a-8712-4e92-a539-0a7a888c1396	FLAKY	1/2
rptest.tests.partition_reassignments_test.PartitionReassignmentsTest.test_reassignments_kafka_cli	ducktape	https://buildkite.com/redpanda/redpanda/builds/60412#01944698-37cf-4e97-94dc-0cde25b5944f	FLAKY	3/6

test results on build#60610

test_id	test_kind	job_url	test_status	passed
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_creating_and_listing_migrations	ducktape	https://buildkite.com/redpanda/redpanda/builds/60610#01945237-dece-4bb9-8f6e-9e4f0f382a68	FLAKY	5/6
rptest.tests.partition_reassignments_test.PartitionReassignmentsTest.test_reassignments_kafka_cli	ducktape	https://buildkite.com/redpanda/redpanda/builds/60610#01945237-decd-48e4-bf4d-510671564a4b	FLAKY	4/6

test results on build#61021

test_id	test_kind	job_url	test_status	passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade	ducktape	https://buildkite.com/redpanda/redpanda/builds/61021#01948c46-ca60-44eb-aed7-06adefda2e23	FLAKY	1/2
rptest.tests.internal_topic_protection_test.InternalTopicProtectionLargeClusterTest.test_consumer_offset_topic	ducktape	https://buildkite.com/redpanda/redpanda/builds/61021#01948c46-ca5f-47ad-81ea-74f0ad25add3	FLAKY	1/2

WillemKauf · 2024-12-15T12:45:57Z

Force push to:

Remove temporary cluster config storage_compaction_key_map_memory_override_for_tests
Use __REDPANDA_TEST_DISABLE_BOUNDED_PROPERTY_CHECKS in log_compaction_test.py instead post config: disable bounded property checks with environment variable #24544 merge.

vbotbuildovich · 2024-12-15T16:14:11Z

Retry command for Build#59782

please wait until all jobs are finished before running the slash command


/ci-repeat 1
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":null}

dotnwat · 2025-01-03T22:28:06Z

/ci-repeat

WillemKauf · 2025-01-07T15:25:51Z

src/v/storage/compaction_reducers.cc

+ss::future<ss::stop_iteration>
+map_building_reducer::operator()(model::record_batch batch) {
+    bool fully_indexed_batch = true;
+    auto b = co_await decompress_batch(std::move(batch));


optimization: don't call _map->put() for records in non-compactible batch types, they would just be a waste of map space?

did you do that right above this line?

Yes, see https://github.com/redpanda-data/redpanda/pull/24423/files#diff-18cf1f0b618cc8cf23e2883b6060d425b8d53511fb3cef098cc9fe6b3d91c6f2R442-R447

And better define it for `simple_key_offset_map`.

Optionally provide a starting offset from which the reader's `min_offset` value is assigned (otherwise, the `base_offset()` of the `segment` is used).

Uses the `map_building_reducer` to perform a linear read of a `segment` and index its keys and offsets, starting from a provided offset.

WillemKauf · 2025-01-07T19:42:04Z

Force push to:

Skip indexing non-compactible batches in map_building_reducer. There is no point to indexing records in uncompactible batches, since their inclusion in the segment post compaction is irrespective of the map
Parameterize random_node_operations_test.py to use chunked compaction

vbotbuildovich · 2025-01-07T22:54:59Z

Retry command for Build#60366

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":true,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":true,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":true,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":true,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":true}

WillemKauf · 2025-01-08T14:20:28Z

Force push to:

Remove chunked compaction metric check from random_node_operations_test which would hit an assert (cannot guarantee all nodes are live at time of metric sum)

WillemKauf · 2025-01-08T20:11:58Z

Nice

In the case that zero segments were able to be indexed for a round of sliding window compaction, chunked compaction must be performed. This implementation uses some of the current abstractions from the compaction utilities to perform several rounds of sliding window compaction with a partially indexed map created from the un-indexed segment in a linear fashion. This implementation is sub-optimal for a number of reasons- namely, that segment indexes are read and rewritten each time a round of chunked compaction is performed. These intermediate states are then used for the next round of chunked compaction. In the future, there may be a more optimal way to perform these steps using less IO by holding more information in memory before flushing the final results to disk, and not every intermediate stage.

GTest `ASSERT_*` macros cannot be used in non-`void` returning functions. Add `RPTEST_EXPECT_EQ` to provide flexibility in testing for non-`void` functions.

To move away from hardcoded boost asserts and provide compatibility in a GTest environment.

This would previously overshoot the `size_bytes` provided to it by filling with `elements_per_fragment()` at least once. In the lower limit, when `required_entries` is less than `elements_per_fragment()`, we should be taking the minimum of the two values and pushing back that number of objects to the `entries` container.

In order to test the chunked compaction routine, parameterize the existing compaction test suite with `storage_compaction_key_map_memory_kb`. By limiting this value, we can force compaction to go down the chunked compaction path, and verify the log using the existing utilities after compaction settles. Some added asserts are used to verify chunked compaction is taken or not taken as a code path, depending on the memory constraints specified.

…pact

WillemKauf · 2025-01-10T20:18:17Z

Force push to:

Obtain _segment_rewrite_lock in disk_log_impl::chunked_sliding_window_compact(). This would otherwise be a possible race between truncation and chunked compaction, as we obtain and release the same seg->read_lock() multiple times in index_chunk_of_segment_for_map(), and additionally perform segment rewrites within disk_log_impl::rewrite_segment_with_offset_map().

Unfortunate that random_node_operations_test didn't catch this on a single CI run. We may need to hit this with a couple hundred ci-repeats.

dotnwat · 2025-01-15T00:10:52Z

Unfortunate that random_node_operations_test didn't catch this on a single CI run. We may need to hit this with a couple hundred ci-repeats.

hehe. we don't need to do that in the context of this PR or is it related to chunked compaction?

WillemKauf · 2025-01-15T14:29:46Z

hehe. we don't need to do that in the context of this PR or is it related to chunked compaction?

The potential race mentioned was related to the added chunked compaction code, yes.

src/v/storage/compaction_reducers.h

src/v/storage/compaction_reducers.cc

dotnwat · 2025-01-21T21:49:33Z

src/v/storage/segment_deduplication_utils.cc

+    co_await map.reset();
+    auto read_holder = co_await seg->read_lock();
+    auto start_offset_inclusive = model::next_offset(last_indexed_offset);
+    auto rdr = internal::create_segment_full_reader(


Leaving comment to rehighlight this point, in case @andrwng or anyone else has any ideas or comments on the cost of this repeated operation/possible tools at our disposal here.

well we have the readers cache, but i dunno if it is useful in this context.

dotnwat · 2025-01-21T21:49:54Z

src/v/storage/segment_deduplication_utils.cc

+    co_await map.reset();
+    auto read_holder = co_await seg->read_lock();
+    auto start_offset_inclusive = model::next_offset(last_indexed_offset);
+    auto rdr = internal::create_segment_full_reader(


since this code is new and not used often, i think we should favor simplicity, unless of course something would be worse than just 'not optimal'.

src/v/storage/disk_log_impl.cc

src/v/storage/key_offset_map.cc

Add a new function `map_building_reducer::maybe_index_record_in_map()` to avoid possibly dangling reference in a continuation. While this code was "technically" safe due to the fact `key_offset_map::put()` didn't have any scheduling points in it, this refactor avoids the problem entirely by moving all defined stack variables into a new coroutine function.

For clarity.

WillemKauf · 2025-01-22T02:48:20Z

Push to:

Add a new function map_building_reducer::maybe_index_record_in_map() to avoid possibly dangling reference in a continuation.
Add more chunked compaction comments to disk_log_impl

WillemKauf requested a review from andrwng December 3, 2024 21:35

github-actions bot added the area/redpanda label Dec 3, 2024

WillemKauf force-pushed the storage_chunked_compaction branch from 1d4f546 to 2ca689c Compare December 3, 2024 21:41

andrwng reviewed Dec 9, 2024

View reviewed changes

WillemKauf commented Dec 11, 2024

View reviewed changes

WillemKauf force-pushed the storage_chunked_compaction branch from 2ca689c to 4d14fd5 Compare December 11, 2024 20:58

WillemKauf requested a review from a team as a code owner December 11, 2024 20:58

WillemKauf removed the request for review from a team December 11, 2024 20:59

WillemKauf force-pushed the storage_chunked_compaction branch from 4d14fd5 to fc57dff Compare December 11, 2024 21:02

WillemKauf requested a review from andrwng December 11, 2024 21:04

redpanda-data deleted a comment from vbotbuildovich Dec 11, 2024

micheleRP reviewed Dec 12, 2024

View reviewed changes

src/v/config/configuration.cc Outdated Show resolved Hide resolved

WillemKauf force-pushed the storage_chunked_compaction branch from fc57dff to fe0991e Compare December 15, 2024 12:44

WillemKauf commented Jan 7, 2025

View reviewed changes

WillemKauf added 4 commits January 7, 2025 14:40

storage: add zero_segments_indexed_exception

b940661

storage: make key_offset_map::reset() a virtual function

70368c3

And better define it for `simple_key_offset_map`.

storage: add optional arg to create_full_segment_reader()

b99fb11

Optionally provide a starting offset from which the reader's `min_offset` value is assigned (otherwise, the `base_offset()` of the `segment` is used).

storage: add index_chunk_of_segment_for_map()

7f6c774

Uses the `map_building_reducer` to perform a linear read of a `segment` and index its keys and offsets, starting from a provided offset.

WillemKauf force-pushed the storage_chunked_compaction branch 2 times, most recently from 71d6a22 to fc4b9dd Compare January 7, 2025 19:41

WillemKauf force-pushed the storage_chunked_compaction branch from fc4b9dd to 1acfc4e Compare January 8, 2025 14:20

WillemKauf added 9 commits January 10, 2025 15:15

storage: add _num_chunked_compaction_runs to probe

ca9b580

test_utils: add RPTEST_EXPECT_EQ

626f29c

GTest `ASSERT_*` macros cannot be used in non-`void` returning functions. Add `RPTEST_EXPECT_EQ` to provide flexibility in testing for non-`void` functions.

storage: use test macros in storage_test_fixture

05d03fc

To move away from hardcoded boost asserts and provide compatibility in a GTest environment.

storage: add compaction_reducer_fixture_test

aafae4c

storage: add TestChunkedCompaction

77e85c5

rptest: parameterize random_node_operations_test with chunked com…

0de4571

…pact

WillemKauf force-pushed the storage_chunked_compaction branch from 1acfc4e to 0de4571 Compare January 10, 2025 20:15

dotnwat reviewed Jan 21, 2025

View reviewed changes

WillemKauf added 2 commits January 21, 2025 21:44

storage: add more chunked compaction comments to disk_log_impl

8a0299e

For clarity.

dotnwat approved these changes Jan 22, 2025

View reviewed changes

dotnwat merged commit 88fb0d2 into redpanda-data:dev Jan 22, 2025
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE-8160] `storage`: add chunked compaction routine #24423

[CORE-8160] `storage`: add chunked compaction routine #24423

WillemKauf commented Dec 3, 2024 •

edited

Loading

dotnwat commented Dec 4, 2024

WillemKauf commented Dec 4, 2024

andrwng left a comment

WillemKauf commented Dec 10, 2024 •

edited

Loading

WillemKauf Dec 11, 2024

WillemKauf Dec 29, 2024

dotnwat Jan 21, 2025

dotnwat Jan 21, 2025

WillemKauf commented Dec 11, 2024 •

edited

Loading

vbotbuildovich commented Dec 11, 2024 •

edited

Loading

WillemKauf commented Dec 12, 2024

WillemKauf commented Dec 12, 2024

vbotbuildovich commented Dec 12, 2024

WillemKauf commented Dec 12, 2024 •

edited

Loading

vbotbuildovich commented Dec 12, 2024 •

edited

Loading

WillemKauf commented Dec 15, 2024

vbotbuildovich commented Dec 15, 2024 •

edited

Loading

dotnwat commented Jan 3, 2025

WillemKauf Jan 7, 2025

dotnwat Jan 21, 2025

WillemKauf Jan 21, 2025

WillemKauf commented Jan 7, 2025

vbotbuildovich commented Jan 7, 2025 •

edited

Loading

WillemKauf commented Jan 8, 2025 •

edited

Loading

WillemKauf commented Jan 8, 2025

WillemKauf commented Jan 10, 2025 •

edited

Loading

dotnwat commented Jan 15, 2025

WillemKauf commented Jan 15, 2025

dotnwat Jan 21, 2025

dotnwat Jan 21, 2025

WillemKauf commented Jan 22, 2025

[CORE-8160] storage: add chunked compaction routine #24423

[CORE-8160] storage: add chunked compaction routine #24423

Conversation

WillemKauf commented Dec 3, 2024 • edited Loading

Backports Required

Release Notes

Improvements

dotnwat commented Dec 4, 2024

WillemKauf commented Dec 4, 2024

andrwng left a comment

Choose a reason for hiding this comment

WillemKauf commented Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillemKauf commented Dec 11, 2024 • edited Loading

vbotbuildovich commented Dec 11, 2024 • edited Loading

WillemKauf commented Dec 12, 2024

WillemKauf commented Dec 12, 2024

vbotbuildovich commented Dec 12, 2024

Retry command for Build#59673

WillemKauf commented Dec 12, 2024 • edited Loading

vbotbuildovich commented Dec 12, 2024 • edited Loading

CI test results

WillemKauf commented Dec 15, 2024

vbotbuildovich commented Dec 15, 2024 • edited Loading

Retry command for Build#59782

dotnwat commented Jan 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillemKauf commented Jan 7, 2025

vbotbuildovich commented Jan 7, 2025 • edited Loading

Retry command for Build#60366

WillemKauf commented Jan 8, 2025 • edited Loading

WillemKauf commented Jan 8, 2025

WillemKauf commented Jan 10, 2025 • edited Loading

dotnwat commented Jan 15, 2025

WillemKauf commented Jan 15, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillemKauf commented Jan 22, 2025

[CORE-8160] `storage`: add chunked compaction routine #24423

[CORE-8160] `storage`: add chunked compaction routine #24423

WillemKauf commented Dec 3, 2024 •

edited

Loading

WillemKauf commented Dec 10, 2024 •

edited

Loading

WillemKauf commented Dec 11, 2024 •

edited

Loading

vbotbuildovich commented Dec 11, 2024 •

edited

Loading

WillemKauf commented Dec 12, 2024 •

edited

Loading

vbotbuildovich commented Dec 12, 2024 •

edited

Loading

vbotbuildovich commented Dec 15, 2024 •

edited

Loading

vbotbuildovich commented Jan 7, 2025 •

edited

Loading

WillemKauf commented Jan 8, 2025 •

edited

Loading

WillemKauf commented Jan 10, 2025 •

edited

Loading