Skip to content

Conversation

@rockwotj
Copy link
Contributor

@rockwotj rockwotj commented Jan 2, 2026

Currently a single broker being down means the list broker calls will
fail, which is undesirable. To fix this we omit some information in
ListBroker RPCs. While this is a semantic breaking change, it's not a
wire or generated code breaking change.

We also use an optional for node_id instead of -1, as that's much
cleaner. We don't break compatibility here FWIW.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

Improvements

  • Calling ListBrokers in the Admin v2 API doesn't require all brokers to be up.

Copilot AI review requested due to automatic review settings January 2, 2026 20:56
@rockwotj rockwotj requested review from a team and michael-redpanda as code owners January 2, 2026 20:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modifies the ListBrokers RPC to no longer require proxying requests to all brokers, preventing failures when individual brokers are down. The change makes node_id in GetBrokerRequest optional (using protobuf optional instead of the -1 sentinel value) and makes build_info and admin_server fields optional in the Broker message, with documentation noting they're only populated for GetBroker RPCs.

Key Changes:

  • Changed GetBrokerRequest.node_id from required to optional field
  • Made Broker.build_info and Broker.admin_server optional fields that are only populated for GetBroker calls
  • Modified list_brokers implementation to directly populate broker node IDs without proxying

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
proto/redpanda/core/admin/v2/broker.proto Changed node_id, build_info, and admin_server to optional fields with updated documentation
src/v/redpanda/admin/services/broker.cc Updated get_broker to check for optional node_id and modified list_brokers to populate only node IDs without proxying
tests/rptest/clients/admin/proto/redpanda/core/admin/v2/broker_pb2.py Regenerated Python protobuf code reflecting the optional field changes
tests/rptest/clients/admin/proto/redpanda/core/admin/v2/broker_pb2.pyi Regenerated Python type stubs with new optional field signatures and methods
Comments suppressed due to low confidence (1)

src/v/redpanda/admin/services/broker.cc:1

  • The comparison target != -1 is now redundant since node_id is optional and this code is only reached when has_node_id() is true. If the intent was to support -1 as a sentinel value for backward compatibility, the current implementation still doesn't handle it properly (it would treat -1 as a valid target). Either remove the -1 check entirely or add explicit handling for the -1 case.
/*

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#78465

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/admin_api_auth_test.py::AdminApiAuthTest.test_admin_v2

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 2, 2026

CI test results

test results on build#78465
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
MasterTestSuite test_remote_partition_read_cached_index unit https://buildkite.com/redpanda/redpanda/builds/78465#019b807f-d7d7-43b4-baf4-9c94d55609d7 FAIL 0/1
AdminApiAuthTest test_admin_v2 null integration https://buildkite.com/redpanda/redpanda/builds/78465#019b8095-6dfe-4f12-ac5f-ea1db618d883 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=AdminApiAuthTest&test_method=test_admin_v2
test results on build#78495
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
AdminApiAuthTest test_admin_v2 null integration https://buildkite.com/redpanda/redpanda/builds/78495#019b8273-aff1-4c3d-b699-be73d7a972ab FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=AdminApiAuthTest&test_method=test_admin_v2
AdminApiAuthTest test_admin_v2 null integration https://buildkite.com/redpanda/redpanda/builds/78495#019b8278-6e86-47b6-8c6d-389c4c0be2f3 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=AdminApiAuthTest&test_method=test_admin_v2
EndToEndCloudTopicsTest test_write null integration https://buildkite.com/redpanda/redpanda/builds/78495#019b8278-6e81-4fb6-94c4-107373c1f917 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=EndToEndCloudTopicsTest&test_method=test_write
test results on build#78499
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
MountUnmountIcebergTest test_simple_remount {"cloud_storage_type": 1} integration https://buildkite.com/redpanda/redpanda/builds/78499#019b8458-bc9e-4f5e-bcad-939e2a3e4264 FLAKY 9/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.1931, p0=0.8830, reject_threshold=0.0100. adj_baseline=0.4747, p1=0.0161, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=MountUnmountIcebergTest&test_method=test_simple_remount

@rockwotj rockwotj force-pushed the broker_protos branch 2 times, most recently from 42510a7 to 5de3179 Compare January 3, 2026 05:49
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 3, 2026

Retry command for Build#78495

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/admin_api_auth_test.py::AdminApiAuthTest.test_admin_v2

Currently a single broker being down means the list broker calls will
fail, which is undesirable. To fix this we omit some information in
ListBroker RPCs. While this is a semantic breaking change, it's not a
wire or generated code breaking change.

We also use an `optional` for node_id instead of `-1`, as that's much
cleaner. We don't break compatibility here FWIW.
Copy link
Contributor

@michael-redpanda michael-redpanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - are we going to backport this? Do we have any downstream consumers using this today?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants