Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-7743] cluster: tiered storage topic property update fixes #23545

Merged
merged 7 commits into from
Oct 7, 2024

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Sep 27, 2024

Relevant JIRA ticket: https://redpandadata.atlassian.net/browse/CORE-7743

Relevant historical threads: #9191, #23220 (comment), #5846.

This PR fixes some bugs that have been present in how redpanda handles AlterConfig and IncrementalAlterConfig API calls and the tiered storage topic properties redpanda.remote.read, redpanda.remote.write.

These bugs are well described in the above links, but the TL;DR is that a command like rpk topic alter-config t --set redpanda.remote.write=true --set redpanda.remote.read=false cannot properly describe both the addition of write permissions and the removal of read permissions due to the binary implementation of shadow_indexing_mode.

The topic_property layer is left untouched, but the incremental_topic_updates has had its property_update<shadow_indexing_mode> field deprecated (privatized in the struct, explicit getters/setters must be used to access it) and two new fields property_update<bool> remote_read/write added.

These fields will be used for topic property updates from now on, for clusters past v24.3 with the feature flag shadow_indexing_split_topic_property_update active. Unfortunately, we must keep the shadow_indexing related updates in the code to maintain backwards compatibility with older redpanda versions.

Another important change to note is that rpk topic alter-config t --delete redpanda.remote.write --delete redpanda.remote.read now behaves as expected, in resetting those topic level permissions to the cluster default. DescribeConfigs requests will also now accurately print the topic level permissions (these bugs were being chased in closed PR #23220. This current PR should be considered a replacement for that one).

Closes #5846, closes #4499, closes #11749

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

Bug Fixes

  • Fixes a bug in which AlterConfig and IncrementalAlterConfig requests modifying redpanda.remote.read and redpanda.remote.write simultaneously were not being properly respected.
  • Fixes a bug in which the responses to DescribeConfigs requests would misrepresent the state of redpanda.remote.read and redpanda.remote.write.

@vbotbuildovich

This comment was marked as outdated.

@WillemKauf WillemKauf force-pushed the si_alter_config_fix branch 2 times, most recently from 980b9d1 to b33245c Compare September 28, 2024 00:25
@vbotbuildovich

This comment was marked as outdated.

@WillemKauf WillemKauf changed the title [CORE-7743] cluster: tiered storage topic property fixes [CORE-7743] cluster: tiered storage topic update fixes Sep 28, 2024
@WillemKauf WillemKauf changed the title [CORE-7743] cluster: tiered storage topic update fixes [CORE-7743] cluster: tiered storage topic property update fixes Sep 28, 2024
Copy link
Contributor

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up, it's looking good.

Do we want to backport this all the way? I think there's no technical reason why we couldn't, but I'm wondering if this should go into a patch release.

src/v/cluster/topic_table.cc Outdated Show resolved Hide resolved
src/v/cluster/topic_table.cc Outdated Show resolved Hide resolved
src/v/migrations/cloud_storage_config.cc Show resolved Hide resolved
src/go/rpk/pkg/cli/topic/config.go Outdated Show resolved Hide resolved
src/v/cluster/topic_table.cc Show resolved Hide resolved
src/v/cluster/tests/cluster_tests.cc Outdated Show resolved Hide resolved
src/v/kafka/server/tests/alter_config_test.cc Show resolved Hide resolved
src/v/kafka/server/tests/alter_config_test.cc Show resolved Hide resolved
@WillemKauf WillemKauf force-pushed the si_alter_config_fix branch from 9d34ec8 to 32eb837 Compare October 1, 2024 16:12
@WillemKauf WillemKauf force-pushed the si_alter_config_fix branch from 32eb837 to 0faad94 Compare October 1, 2024 19:51
@WillemKauf WillemKauf force-pushed the si_alter_config_fix branch 2 times, most recently from 1ecaefc to 7957f50 Compare October 2, 2024 18:08
@WillemKauf
Copy link
Contributor Author

Force push to:

  • Add the new feature flag shadow_indexing_split_topic_property_update, which is used at the kafka layer in order to determine which code-path should be followed for the shadow_indexing topic property update- the old, bug prone code path, or the updated one.

src/v/cluster/types.h Outdated Show resolved Hide resolved
@WillemKauf WillemKauf force-pushed the si_alter_config_fix branch from 7957f50 to c50d1fd Compare October 3, 2024 14:06
@WillemKauf WillemKauf force-pushed the si_alter_config_fix branch 2 times, most recently from 9d107ae to 141c43b Compare October 3, 2024 15:55
@WillemKauf
Copy link
Contributor Author

Force push to:

  • Make restart_node_and_await_stable_leader() in alter_topic_configuration_test.py deterministic by checking the cluster health overview in order to assert on controller stabilization.

Copy link
Contributor

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final minor things left but otherwise it looks good to me.

src/v/cluster/topic_table.cc Outdated Show resolved Hide resolved
tests/rptest/tests/alter_topic_configuration_test.py Outdated Show resolved Hide resolved
tests/rptest/tests/alter_topic_configuration_test.py Outdated Show resolved Hide resolved
@@ -76,6 +76,8 @@ valid, but does not apply it.
// at the same time, so we issue first the set request for write,
// then the rest of the requests.
// See https://github.com/redpanda-data/redpanda/issues/9191
// TODO: Remove this once v24.2 is EOL.
// See https://github.com/redpanda-data/redpanda/pull/23545.
_, isRRR := setKVs["redpanda.remote.read"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't backport this PR because of the feature gate, but if we wanted to fix these bugs in older versions we could implement this request-splitting logic that rpk is doing inside redpanda in 24.1/24.2/23.3. It might be worth asking product if we want to do that.

To be clear, I think we should get this PR in because this seems like the right long-term fix, but since these bugs are important we might want to release a backportable fix for them as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't backport this PR because of the feature gate, but if we wanted to fix these bugs in older versions we could implement this request-splitting logic that rpk is doing inside redpanda in 24.1/24.2/23.3. It might be worth asking product if we want to do that.

I think it's a good idea, so that it can work with other tools that support setting arbitrary config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, these are pretty important bugs, it would be good to fix them in previous versions.

This should be done in a future PR, but for now my focus is on v24.3.

@WillemKauf WillemKauf force-pushed the si_alter_config_fix branch 2 times, most recently from 84283da to feea5e4 Compare October 3, 2024 17:33
@WillemKauf WillemKauf requested a review from pgellert October 3, 2024 17:37
@WillemKauf WillemKauf force-pushed the si_alter_config_fix branch from feea5e4 to cfd0f59 Compare October 3, 2024 20:45
@WillemKauf
Copy link
Contributor Author

Force push to:

  • Deflake race-y test in alter_topic_configuration_test.py - trim_logs() does not appear to be blocking, and was causing unreliable test outcomes when checking node logs for the deprecated log line.

@WillemKauf WillemKauf force-pushed the si_alter_config_fix branch 2 times, most recently from f6e2b1c to be4a8c2 Compare October 4, 2024 13:07
@WillemKauf
Copy link
Contributor Author

Force push to:

  • Rebase to upstream dev (sorry, forgot to apply --keep-base) in order to add required license to alter_topic_configuration_test.py here.

To feature guard the new code path taken for the `shadow_indexing`
topic property, add it to the feature table.

Clusters past version `v24.3` will follow the new, bug free code path in
the `topic_table` for their `AlterConfig`/`IncrementalAlterConfig` requests,
while older clusters will follow the now deprecated code-path.

Also, update `partition_properties_stm` to use the new `release_version` enum.
In order to deprecate and better control call sites that use this
field, move it to a `private` field and add get/set functions.

Also modify use-sites and constructors that can no longer treat
`incremental_topic_updates` as an aggregate for initialization.
To deal with the bugs that have come about due to use of the `shadow_indexing`
field in the `AlterConfig`/`IncrementalAlterConfig` APIs, split the update field
into two seperate `property_update<bool>`s, `remote_read` and `remote_write`.

This fixes any bugs that were present and described in the JIRA thread here:
https://redpandadata.atlassian.net/browse/CORE-7743

For legacy reasons, the old `incremental_update` function for `shadow_indexing`
must still be available, in case of an older broker sending topic property
updates to an updated broker. However, updated clusters with the new feature flag
`shadow_indexing_split_topic_property_update` active will use the `remote_read` and
`remote_write` fields for their `incremental_update`s.
Adds `test_shadow_indexing_alter_configs` and
`test_shadow_indexing_incremental_alter_configs`.
Also add some `node` parameters to `KCL` related functions
in order to allow the issuing of Kafka requests to specific brokers.
Kafka ignores case when setting a boolean property:
https://github.com/apache/kafka/blob/master/clients/src/main/...
...java/org/apache/kafka/common/config/ConfigDef.java#L690-L710

Realign ourselves with Kafka by transforming the string value to
lowercase before parsing it in `parse_and_set_bool()`.
These workarounds should be removed in a future version of
`redpanda` (once `v24.2` is EOL), now that bug fixes are in
place in the core.

They are left here for now to ensure backwards compatibility
between an upgraded version of `rpk` and an un-upgraded `redpanda`
cluster.
@WillemKauf WillemKauf force-pushed the si_alter_config_fix branch from be4a8c2 to 540ae13 Compare October 4, 2024 14:35
@WillemKauf
Copy link
Contributor Author

WillemKauf commented Oct 4, 2024

Force push to:

  • Resolve merge conflicts with upstream dev
  • Use new release_version enum in feature_table

Copy link
Contributor

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but let's get someone else to review this as well

@WillemKauf WillemKauf requested a review from dotnwat October 7, 2024 15:54
@@ -152,8 +167,17 @@ create_topic_properties_update(alter_configs_resource& resource) {
continue;
}
if (cfg.name == topic_property_remote_read) {
// Legacy update for shadow indexing field
{
if (shadow_indexing_split_update) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we get rid of 'shadow indexing' in the name? maybe just 'tiered-storage'?
the SI comes from the very first implementation which was called 'shadow indexing' and since then it was renamed everywhere. I don't mind if the name will stay though. It will be easier to connect it with the old flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the 'split' in the name focuses on the low level detail, it will probably be confusing for some people

Copy link
Contributor Author

@WillemKauf WillemKauf Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the topic property is still named shadow_indexing, I think it would be best if this rename was applied there as well in a future PR. What do you think?

Copy link
Contributor

@Lazin Lazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one minor nit regarding the name

@WillemKauf WillemKauf merged commit 58f68dd into redpanda-data:dev Oct 7, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants