Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-8160] storage: add chunked compaction routine #24423

Merged
merged 18 commits into from
Jan 22, 2025

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Dec 3, 2024

This PR deals with the case in which zero segments were indexed for a round of sliding window compaction. This can happen for segments with a large number of unique keys, and per the memory constraints imposed on our key-offset hash map by storage_compaction_key_map_memory (128MiB by default).

This (historically) has not come about often, and may also be naturally alleviated by deduplicating or partially indexing the problem segment in question during future rounds of compaction (provided there is a steady ingress rate to the partition, and that keys in the problem segment are present in newer segments in the log), but added here is a routine that can handle this corner case when it arises.

Instead of throwing and logging an error when zero segments are indexed, we will now fall back to a "chunked" compaction routine.

This implementation uses some of the current abstractions from the compaction utilities to perform several rounds (chunks) of sliding window compaction with a partially indexed map created from the un-indexed segment by reading it in a linear fashion.

This implementation is sub-optimal for a number of reasons- primarily, segment indexes are read and rewritten each time a round of chunked compaction is performed. These intermediate states are then used for the next round of chunked compaction.

In the future, there may be a more optimal way to perform these steps using less IO by holding more information in memory before flushing the final results to disk, instead of flushing every intermediate stage. However, this case in which chunked compaction is required has seemed to be infrequent enough that merely having the implementation is valuable.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Improvements

  • Adds a chunked compaction routine to local storage, which is used as a fallback in the case that we fail to index a single segment during sliding window compaction.

@WillemKauf WillemKauf requested a review from andrwng December 3, 2024 21:35
@WillemKauf WillemKauf force-pushed the storage_chunked_compaction branch from 1d4f546 to 2ca689c Compare December 3, 2024 21:41
@dotnwat
Copy link
Member

dotnwat commented Dec 4, 2024

In the case that zero segments were indexed for a round of sliding window compaction, we will now fall back to a chunked compaction routine.

Can you explain what "chunked compaction" is? When would sliding window fail to index segments, and why do we care?

@WillemKauf
Copy link
Contributor Author

Can you explain what "chunked compaction" is? When would sliding window fail to index segments, and why do we care?

Added more detail to cover letter to address these points.

Copy link
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much looks good! No major complains about structure, just some naming suggestions. Nice work!

Also could probably use some ducktape testing, though IIRC you mentioned a separate PR for stress testing compaction

src/v/storage/compaction_reducers.cc Outdated Show resolved Hide resolved
src/v/storage/compaction_reducers.h Outdated Show resolved Hide resolved
src/v/storage/segment_deduplication_utils.h Outdated Show resolved Hide resolved
src/v/storage/segment_deduplication_utils.h Outdated Show resolved Hide resolved
src/v/storage/tests/compaction_e2e_test.cc Show resolved Hide resolved
@WillemKauf
Copy link
Contributor Author

WillemKauf commented Dec 10, 2024

Also could probably use some ducktape testing, though IIRC you mentioned a separate PR for stress testing compaction

That PR is merged, I'm going to parameterize it in order to test chunked compaction and assert on some added metrics.

Will have updates to this tomorrow soon (TM).

co_await map.reset();
auto read_holder = co_await seg->read_lock();
auto start_offset_inclusive = model::next_offset(last_indexed_offset);
auto rdr = internal::create_segment_full_reader(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recreating this full segment reader for each round of chunked compaction is a bummer.

Not sure if we have any abstractions to get around this- log_reader::reset_config() gave me some hope that the segment's lease/lock could be reused, but it doesn't seem to allow us to reset with a start_offset lower than what has been currently read.

For context, we have to do this because in the chunked_compaction_reducer, once we fail to index an offset for a record in a batch, we break out of the loop and will have to re-read that batch in the next round using that offset as the start, inclusively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving comment to rehighlight this point, in case @andrwng or anyone else has any ideas or comments on the cost of this repeated operation/possible tools at our disposal here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving comment to rehighlight this point, in case @andrwng or anyone else has any ideas or comments on the cost of this repeated operation/possible tools at our disposal here.

well we have the readers cache, but i dunno if it is useful in this context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this code is new and not used often, i think we should favor simplicity, unless of course something would be worse than just 'not optimal'.

@WillemKauf WillemKauf force-pushed the storage_chunked_compaction branch from 2ca689c to 4d14fd5 Compare December 11, 2024 20:58
@WillemKauf WillemKauf requested a review from a team as a code owner December 11, 2024 20:58
@WillemKauf WillemKauf removed the request for review from a team December 11, 2024 20:59
@WillemKauf WillemKauf force-pushed the storage_chunked_compaction branch from 4d14fd5 to fc57dff Compare December 11, 2024 21:02
@WillemKauf
Copy link
Contributor Author

WillemKauf commented Dec 11, 2024

Force push to:

  • Rebase to upstream/dev
  • Add a new condition to reset the compaction sliding window offset in chunked_sliding_window_compaction- very important.
  • Add cluster config setting storage_compaction_key_map_memory_override_for_tests.
  • Parameterize log_compaction_test.py in order to test chunked compaction.
  • Address code review comments by adding documentation and renaming some objects/functions.
  • Fix logic in key_offset_map::initialize().
  • Add chunked_compaction_runs metric to storage::probe.

@WillemKauf WillemKauf requested a review from andrwng December 11, 2024 21:04
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Dec 11, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Dec 11, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Dec 11, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Dec 11, 2024
@WillemKauf
Copy link
Contributor Author

/ci-repeat 5
release
skip-units
skip-redpanda-build
dt-repeat=100
tests/rptest/tests/log_compaction_test.py

@WillemKauf
Copy link
Contributor Author

/ci-repeat 5
release
skip-units
skip-redpanda-build
dt-repeat=100
tests/rptest/tests/log_compaction_test.py

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#59673

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/log_compaction_test.py::LogCompactionTest.compaction_stress_test@{"cleanup_policy":"compact,delete","key_set_cardinality":100,"storage_compaction_key_map_memory_kb":131072}

@WillemKauf
Copy link
Contributor Author

WillemKauf commented Dec 12, 2024

https://ci-artifacts.dev.vectorized.cloud/redpanda/59673/0193bc0b-0365-4203-802b-c969372ea7ac/vbuild/ducktape/results/final/report.html

raise RuntimeError( RuntimeError: KgoVerifierProducer-0-139757941539360 possible idempotency bug: ProduceStatus<103424 102400 1024 1 0 0 0 41112 8055.5/12348.5/20587>

time="2024-12-12T19:30:07Z" level=warning msg="Produced at unexpected offset 3508 (expected 2493) on partition 0"

Possibly bad interaction between partition movement and KgoVerifierProducer?

Seemingly unrelated to compaction changes.

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Dec 12, 2024

CI test results

test results on build#59673
test_id test_kind job_url test_status passed
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.delete.key_set_cardinality=100.storage_compaction_key_map_memory_kb=131072 ducktape https://buildkite.com/redpanda/redpanda/builds/59673#0193bc0b-0365-4203-802b-c969372ea7ac FLAKY 99/100
test results on build#59782
test_id test_kind job_url test_status passed
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=None.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59782#0193caa2-8bfd-4e36-a47d-8909582cb230 FAIL 0/6
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/59782#0193caa2-8bfe-4aad-ab48-5c49355e8883 FLAKY 3/6
test results on build#60281
test_id test_kind job_url test_status passed
rptest.tests.archival_test.ArchivalTest.test_all_partitions_leadership_transfer.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/60281#01942e92-2894-4362-8c2b-78b171fbbf6c FLAKY 5/6
test results on build#60366
test_id test_kind job_url test_status passed
rm_stm_tests_rpunit.rm_stm_tests_rpunit unit https://buildkite.com/redpanda/redpanda/builds/60366#01944251-4af6-47ee-9668-c08bca99fde8 FLAKY 1/2
rptest.tests.partition_reassignments_test.PartitionReassignmentsTest.test_reassignments_kafka_cli ducktape https://buildkite.com/redpanda/redpanda/builds/60366#019442ab-5efa-48d0-b625-d105d2cb0754 FLAKY 4/6
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c79-4e6b-8108-65f7221292b3 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c77-454b-81d5-9dc77f2e830e FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=True.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c79-4e6b-8108-65f7221292b3 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=True.with_tiered_storage=False.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c76-468a-a4a7-505c7f86ed9b FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=True.with_tiered_storage=False.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c77-454b-81d5-9dc77f2e830e FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=True.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c76-468a-a4a7-505c7f86ed9b FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=True.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c77-454b-81d5-9dc77f2e830e FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=False.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c76-468a-a4a7-505c7f86ed9b FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=False.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c77-454b-81d5-9dc77f2e830e FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c79-4e6b-8108-65f7221292b3 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c76-468a-a4a7-505c7f86ed9b FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c77-454b-81d5-9dc77f2e830e FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=True.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c79-4e6b-8108-65f7221292b3 FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=False.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c76-468a-a4a7-505c7f86ed9b FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=False.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c77-454b-81d5-9dc77f2e830e FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c76-468a-a4a7-505c7f86ed9b FAIL 0/1
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=True.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60366#01944298-4c77-454b-81d5-9dc77f2e830e FAIL 0/1
test results on build#60412
test_id test_kind job_url test_status passed
gtest_raft_rpunit.gtest_raft_rpunit unit https://buildkite.com/redpanda/redpanda/builds/60412#0194464a-8713-461f-9266-2b856bb3e64c FLAKY 1/2
rm_stm_tests_rpunit.rm_stm_tests_rpunit unit https://buildkite.com/redpanda/redpanda/builds/60412#0194464a-8712-4e92-a539-0a7a888c1396 FLAKY 1/2
rptest.tests.partition_reassignments_test.PartitionReassignmentsTest.test_reassignments_kafka_cli ducktape https://buildkite.com/redpanda/redpanda/builds/60412#01944698-37cf-4e97-94dc-0cde25b5944f FLAKY 3/6
test results on build#60610
test_id test_kind job_url test_status passed
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_creating_and_listing_migrations ducktape https://buildkite.com/redpanda/redpanda/builds/60610#01945237-dece-4bb9-8f6e-9e4f0f382a68 FLAKY 5/6
rptest.tests.partition_reassignments_test.PartitionReassignmentsTest.test_reassignments_kafka_cli ducktape https://buildkite.com/redpanda/redpanda/builds/60610#01945237-decd-48e4-bf4d-510671564a4b FLAKY 4/6
test results on build#61021
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61021#01948c46-ca60-44eb-aed7-06adefda2e23 FLAKY 1/2
rptest.tests.internal_topic_protection_test.InternalTopicProtectionLargeClusterTest.test_consumer_offset_topic ducktape https://buildkite.com/redpanda/redpanda/builds/61021#01948c46-ca5f-47ad-81ea-74f0ad25add3 FLAKY 1/2

@WillemKauf WillemKauf force-pushed the storage_chunked_compaction branch from fc57dff to fe0991e Compare December 15, 2024 12:44
@WillemKauf
Copy link
Contributor Author

Force push to:

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Dec 15, 2024

Retry command for Build#59782

please wait until all jobs are finished before running the slash command


/ci-repeat 1
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":null}

@dotnwat
Copy link
Member

dotnwat commented Jan 3, 2025

/ci-repeat

ss::future<ss::stop_iteration>
map_building_reducer::operator()(model::record_batch batch) {
bool fully_indexed_batch = true;
auto b = co_await decompress_batch(std::move(batch));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimization: don't call _map->put() for records in non-compactible batch types, they would just be a waste of map space?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you do that right above this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And better define it for `simple_key_offset_map`.
Optionally provide a starting offset from which the reader's
`min_offset` value is assigned (otherwise, the `base_offset()` of
the `segment` is used).
Uses the `map_building_reducer` to perform a linear read of a `segment`
and index its keys and offsets, starting from a provided offset.
@WillemKauf WillemKauf force-pushed the storage_chunked_compaction branch 2 times, most recently from 71d6a22 to fc4b9dd Compare January 7, 2025 19:41
@WillemKauf
Copy link
Contributor Author

Force push to:

  • Skip indexing non-compactible batches in map_building_reducer. There is no point to indexing records in uncompactible batches, since their inclusion in the segment post compaction is irrespective of the map
  • Parameterize random_node_operations_test.py to use chunked compaction

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 7, 2025

Retry command for Build#60366

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":true,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":true,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":true,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":true,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":false,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":2,"enable_failures":true,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":false,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":false,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":true}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":false}
tests/rptest/tests/random_node_operations_test.py::RandomNodeOperationsTest.test_node_operations@{"cloud_storage_type":1,"enable_failures":true,"mixed_versions":true,"with_chunked_compaction":true,"with_iceberg":false,"with_tiered_storage":true}

@WillemKauf WillemKauf force-pushed the storage_chunked_compaction branch from fc4b9dd to 1acfc4e Compare January 8, 2025 14:20
@WillemKauf
Copy link
Contributor Author

WillemKauf commented Jan 8, 2025

Force push to:

  • Remove chunked compaction metric check from random_node_operations_test which would hit an assert (cannot guarantee all nodes are live at time of metric sum)

@WillemKauf
Copy link
Contributor Author

Nice

In the case that zero segments were able to be indexed for a round of sliding
window compaction, chunked compaction must be performed.

This implementation uses some of the current abstractions from the compaction
utilities to perform several rounds of sliding window compaction with a
partially indexed map created from the un-indexed segment in a linear fashion.

This implementation is sub-optimal for a number of reasons- namely,
that segment indexes are read and rewritten each time a round of chunked
compaction is performed. These intermediate states are then used for the
next round of chunked compaction.

In the future, there may be a more optimal way to perform these steps
using less IO by holding more information in memory before flushing
the final results to disk, and not every intermediate stage.
GTest `ASSERT_*` macros cannot be used in non-`void` returning
functions.

Add `RPTEST_EXPECT_EQ` to provide flexibility in testing for non-`void`
functions.
To move away from hardcoded boost asserts and provide
compatibility in a GTest environment.
This would previously overshoot the `size_bytes` provided to it
by filling with `elements_per_fragment()` at least once.

In the lower limit, when `required_entries` is less than `elements_per_fragment()`,
we should be taking the minimum of the two values and pushing back that
number of objects to the `entries` container.
In order to test the chunked compaction routine, parameterize the existing
compaction test suite with `storage_compaction_key_map_memory_kb`.

By limiting this value, we can force compaction to go down the chunked compaction
path, and verify the log using the existing utilities after compaction settles.

Some added asserts are used to verify chunked compaction is taken or not taken
as a code path, depending on the memory constraints specified.
@WillemKauf WillemKauf force-pushed the storage_chunked_compaction branch from 1acfc4e to 0de4571 Compare January 10, 2025 20:15
@WillemKauf
Copy link
Contributor Author

WillemKauf commented Jan 10, 2025

Force push to:

  • Obtain _segment_rewrite_lock in disk_log_impl::chunked_sliding_window_compact(). This would otherwise be a possible race between truncation and chunked compaction, as we obtain and release the same seg->read_lock() multiple times in index_chunk_of_segment_for_map(), and additionally perform segment rewrites within disk_log_impl::rewrite_segment_with_offset_map().

Unfortunate that random_node_operations_test didn't catch this on a single CI run. We may need to hit this with a couple hundred ci-repeats.

@dotnwat
Copy link
Member

dotnwat commented Jan 15, 2025

Unfortunate that random_node_operations_test didn't catch this on a single CI run. We may need to hit this with a couple hundred ci-repeats.

hehe. we don't need to do that in the context of this PR or is it related to chunked compaction?

@WillemKauf
Copy link
Contributor Author

hehe. we don't need to do that in the context of this PR or is it related to chunked compaction?

The potential race mentioned was related to the added chunked compaction code, yes.

src/v/storage/compaction_reducers.h Show resolved Hide resolved
src/v/storage/compaction_reducers.cc Outdated Show resolved Hide resolved
src/v/storage/compaction_reducers.cc Outdated Show resolved Hide resolved
co_await map.reset();
auto read_holder = co_await seg->read_lock();
auto start_offset_inclusive = model::next_offset(last_indexed_offset);
auto rdr = internal::create_segment_full_reader(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving comment to rehighlight this point, in case @andrwng or anyone else has any ideas or comments on the cost of this repeated operation/possible tools at our disposal here.

well we have the readers cache, but i dunno if it is useful in this context.

co_await map.reset();
auto read_holder = co_await seg->read_lock();
auto start_offset_inclusive = model::next_offset(last_indexed_offset);
auto rdr = internal::create_segment_full_reader(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this code is new and not used often, i think we should favor simplicity, unless of course something would be worse than just 'not optimal'.

src/v/storage/disk_log_impl.cc Outdated Show resolved Hide resolved
src/v/storage/disk_log_impl.cc Show resolved Hide resolved
src/v/storage/disk_log_impl.cc Show resolved Hide resolved
src/v/storage/disk_log_impl.cc Show resolved Hide resolved
src/v/storage/key_offset_map.cc Show resolved Hide resolved
Add a new function `map_building_reducer::maybe_index_record_in_map()`
to avoid possibly dangling reference in a continuation.

While this code was "technically" safe due to the fact `key_offset_map::put()`
didn't have any scheduling points in it, this refactor avoids the problem
entirely by moving all defined stack variables into a new coroutine function.
@WillemKauf
Copy link
Contributor Author

Push to:

  • Add a new function map_building_reducer::maybe_index_record_in_map() to avoid possibly dangling reference in a continuation.
  • Add more chunked compaction comments to disk_log_impl

@dotnwat dotnwat merged commit 88fb0d2 into redpanda-data:dev Jan 22, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants