Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some oversized alloc high partition count improvements #24578

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

StephanDollberg
Copy link
Member

Address some oversized alloc in niche cases like super wide reads and super large topics.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#59797

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":null}

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Dec 16, 2024

CI test results

test results on build#59797
test_id test_kind job_url test_status passed
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=None.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59797#0193cf52-9db0-4e8d-9da2-ed4efe9bf96a FAIL 0/6
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/59797#0193cf52-9db3-4053-97b0-48ee43c0e1b0 FLAKY 1/6
test results on build#59908
test_id test_kind job_url test_status passed
rptest.tests.archive_retention_test.CloudArchiveRetentionTest.test_delete.cloud_storage_type=CloudStorageType.ABS.retention_type=retention.ms ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FLAKY 5/6
rptest.tests.availability_test.AvailabilityTests.test_recovery_after_catastrophic_failure ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.cloud_storage_usage_test.CloudStorageUsageTest.test_cloud_storage_usage_reporting ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/6
rptest.tests.consumer_offsets_consistency_test.ConsumerOffsetsConsistencyTest.test_flipping_leadership ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/1
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.None.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/6
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.None.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/1
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.executed.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.executing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/6
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.executing.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/6
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.prepared.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/6
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.preparing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FLAKY 1/6
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.preparing.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/1
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.out.stage.executed.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.None.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.executed.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/6
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.executed.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/6
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.executing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/6
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.prepared.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/6
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.prepared.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/6
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.preparing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.out.stage.executed.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/1
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.out.stage.executing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/1
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.out.stage.prepared.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.delete_records_test.DeleteRecordsTest.test_delete_records_topic_start_delta.cloud_storage_enabled=True ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/1
rptest.tests.e2e_shadow_indexing_test.EndToEndShadowIndexingTest.test_reset_from_cloud.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FLAKY 3/6
rptest.tests.flink_basic_test.FlinkBasicTests.test_transaction_workload ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/1
rptest.tests.full_disk_test.FullDiskTest.test_full_disk_no_produce ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/1
rptest.tests.leaders_info_api_test.LeadersInfoApiEndToEndTest.reset_leaders_info_end_to_end_test ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.leadership_transfer_test.LeadershipPinningTest.test_leadership_pinning_sanctions ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/1
rptest.tests.leadership_transfer_test.MultiTopicAutomaticLeadershipBalancingTest.test_topic_aware_rebalance ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/1
rptest.tests.node_pool_migration_test.NodePoolMigrationTest.test_migrating_redpanda_nodes_to_new_pool.balancing_mode=node_add.test_mode=TestMode.NO_TIRED_STORAGE.cleanup_policy=compact.delete ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da71-f3d5-4eae-b7c3-d73954385d1c FAIL 0/2
rptest.tests.node_resize_test.NodeResizeTest.test_node_resize ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_decommissioning_cancel_ongoing_movements ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/1
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_decommissioning_crashed_node ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/1
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_decommissioning_finishes_after_manual_cancellation.delete_topic=False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_decommissioning_working_node.delete_topic=False.tick_interval=3600000 ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_decommissioning_working_node.delete_topic=True.tick_interval=3600000 ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/1
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_decommissioning_working_node.delete_topic=True.tick_interval=5000 ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/1
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_flipping_decommission_recommission.node_is_alive=False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_node_is_not_allowed_to_join_after_restart.new_bootstrap=True ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/1
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_recommissioning_do_not_stop_all_moves_node ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_recommissioning_node_finishes ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/1
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_recommissioning_one_of_decommissioned_nodes ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/1
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_recycle_all_nodes.auto_assign_node_id=False ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.partition_force_reconfiguration_test.PartitionForceReconfigurationTest.test_basic_reconfiguration.acks=1.restart=True.controller_snapshots=True ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.partition_reassignments_test.PartitionReassignmentsTest.test_add_partitions_with_inprogress_reassignments ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/1
rptest.tests.partition_reassignments_test.PartitionReassignmentsTest.test_reassignments_kafka_cli ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/1
rptest.tests.read_replica_e2e_test.TestReadReplicaService.test_partition_movement.partition_count=10 ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.read_replica_e2e_test.TestReadReplicaService.test_simple_end_to_end.partition_count=10.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.path ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/1
rptest.tests.read_replica_e2e_test.TestReadReplicaService.test_simple_end_to_end.partition_count=10.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/1
rptest.tests.scaling_up_test.ScalingUpTest.test_adding_node_with_unavailable_node ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.scaling_up_test.ScalingUpTest.test_on_demand_rebalancing.partition_count=20 ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.shard_placement_test.ShardPlacementTest.test_core_count_change ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/1
rptest.tests.shard_placement_test.ShardPlacementTest.test_node_join.disable_license=True ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/1
rptest.tests.topic_creation_test.TopicRecreateTest.test_topic_recreation_while_producing.workload=ACKS_1.cleanup_policy=compact ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/1
rptest.tests.topic_creation_test.TopicRecreateTest.test_topic_recreation_while_producing.workload=ACKS_ALL.cleanup_policy=compact ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae8-48e3-9716-533969d6f066 FAIL 0/1
rptest.tests.topic_creation_test.TopicRecreateTest.test_topic_recreation_while_producing.workload=IDEMPOTENT.cleanup_policy=compact ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/1
rptest.tests.topic_creation_test.TopicRecreateTest.test_topic_recreation_while_producing.workload=IDEMPOTENT.cleanup_policy=delete ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae7-4193-8cf2-ee7f9c424663 FAIL 0/1
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=check_manifest_existence ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/1
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=no_check ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/1
rptest.transactions.stream_verifier_test.StreamVerifierTest.test_simple_produce_consume_txn_with_add_node ducktape https://buildkite.com/redpanda/redpanda/builds/59908#0193da6c-cae6-49e9-b165-813181b9148f FAIL 0/6
test results on build#59952
test_id test_kind job_url test_status passed
rptest.tests.cloud_retention_test.CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=None.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/59952#0193decd-b638-4c2b-a003-218bc17844ea FAIL 0/6
rptest.tests.controller_log_limiting_test.ControllerLogLimitMirrorMakerTests.test_mirror_maker_with_limits ducktape https://buildkite.com/redpanda/redpanda/builds/59952#0193decd-b639-401a-94ac-757bf83f1685 FLAKY 5/6
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/59952#0193decd-b63a-4cef-ab8e-61c1fc387a1c FLAKY 3/6
test results on build#60033
test_id test_kind job_url test_status passed
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60033#0193e79f-b77b-41d3-aa9e-fe5c70f35c3f FLAKY 4/6
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=True.with_iceberg=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60033#0193e7a4-e334-4665-aab7-bc653b27ae63 FLAKY 5/6

: fetches_per_shard() {
fetches_per_shard.reserve(shards);
for (size_t i = 0; i < shards; i++) {
auto& fps = fetches_per_shard.emplace_back(start_time);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's better to just change the shard_fetch ctor to take time & shard now, and then you can pass them both to emplace and void this weird thing where we fill in the shard after (should be no slower, may be faster). Then we don't leave shard uninit any more which is good.

I believe it was me who did it like this, mostly just to avail myself of the "n copies" constructor but we aren't using that any more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(approving though so feel free to ignore)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

travisdowns
travisdowns previously approved these changes Dec 18, 2024
@travisdowns
Copy link
Member

/ci-repeat 1
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest
skip-redpanda-build

@StephanDollberg StephanDollberg force-pushed the stephan/more-oversized-high-part branch 3 times, most recently from 721745d to bd30a74 Compare December 18, 2024 11:34
travisdowns
travisdowns previously approved these changes Dec 18, 2024
@vbotbuildovich
Copy link
Collaborator

Retry command for Build#59952

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/cloud_retention_test.py::CloudRetentionTest.test_cloud_retention@{"cloud_storage_type":2,"max_consume_rate_mb":null}

For super high partition topics at high partition density these can lead
to oversized allocs due to the use of `absl::flat_hash_map`.

Switch to chunked_hash_map.
`ntp_fetch_config` is 250 bytes. When doing super wide reads of
500-1000k partitions (quite niche) this goes above the oversized alloc
threshold. Switch to chunked_vector.

Switch responses too for uniformity.
@travisdowns travisdowns force-pushed the stephan/more-oversized-high-part branch from ff22b10 to 7c334c3 Compare December 21, 2024 03:30
@travisdowns
Copy link
Member

(rebased to kick off CI again)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants