Skip to content

Conversation

@WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Jan 5, 2026

A port of the existing shadow_indexing.json API to Protobuf for our AdminV2 migration project, performed 99% by Claude.

Only the Protobuf spec is committed here, handler implementation will follow after the spec is accepted.

Human thoughts upfront

Claude was pretty good at this, I thoroughly reviewed the changes and didn't find anything lost in translation between the Swagger spec and the Protobuf spec.

Claude commit message

Port all cloud storage admin endpoints from shadow_indexing.json to protobuf-based admin API v2. Includes 12 RPCs:

  • Cluster and topic recovery operations
  • Partition status and manifest retrieval
  • Lifecycle marker management
  • Cache trimming and anomaly detection
  • Metadata reset operations

All messages include comprehensive field documentation for OpenAPI spec generation.

Claude session output, in Markdown

https://gist.github.com/WillemKauf/cf785b135c2aa581a4ebae5ad3b8ecce

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

  • none

Copilot AI review requested due to automatic review settings January 5, 2026 15:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports cloud storage admin endpoints from the existing Swagger-based API (shadow_indexing.json) to the new protobuf-based admin API v2. The implementation adds 12 RPCs covering cluster recovery, partition status retrieval, lifecycle marker management, cache operations, and metadata operations.

Key Changes:

  • Added comprehensive protobuf service definition for cloud storage status operations
  • All 12 RPCs include detailed field documentation for OpenAPI spec generation
  • Integrated with existing common protobuf types (e.g., TopicPartition)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
proto/redpanda/core/admin/v2/cloud_storage_status.proto New protobuf service definition with 12 RPCs and 30+ message types for cloud storage operations
proto/redpanda/core/admin/v2/BUILD Bazel build configuration for the new proto library and its dependencies

Port all cloud storage admin endpoints from shadow_indexing.json
to protobuf-based admin API v2. Includes 12 RPCs:
- Cluster and topic recovery operations
- Partition status and manifest retrieval
- Lifecycle marker management
- Cache trimming and anomaly detection
- Metadata reset operations

All messages include comprehensive field documentation for
OpenAPI spec generation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@WillemKauf WillemKauf force-pushed the cloud_storage_api_proto branch from bec1e41 to 79eb957 Compare January 5, 2026 15:49
@WillemKauf WillemKauf changed the title proto/admin: Add CloudStorageStatusService [CORE-15110] proto/admin: Add CloudStorageStatusService Jan 5, 2026
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it intentional that the service implemetnation isn't included in the PR?

@WillemKauf
Copy link
Contributor Author

is it intentional that the service implemetnation isn't included in the PR?

Yes, we plan to propose the protos first, then implement the handlers for migration.

https://redpandadata.atlassian.net/browse/CORE-15102

dotnwat
dotnwat previously approved these changes Jan 5, 2026
@@ -0,0 +1,493 @@
// Copyright 2025 Redpanda Data, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chore: year

Comment on lines +211 to +212
// Retention milliseconds for recovered topics.
optional int64 retention_ms = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/question: Why not a google.protobuf.Duration for time based values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah n/m it replaces a topic property

Comment on lines +242 to +250
// TopicRecoveryStatus contains the status of a topic recovery operation.
message TopicRecoveryStatus {
// Current state of topic recovery.
string state = 1;
// Download counts per topic.
repeated TopicDownloadCounts topic_download_counts = 2;
// Original recovery request parameters.
optional RecoveryRequestParams request = 3;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why is this its own message when it's only used in QueryAutomatedRecoveryResponse?

Comment on lines +248 to +249
// Original recovery request parameters.
optional RecoveryRequestParams request = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this optional in the response? Should it always be provided? When is it or is not provided?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall:

Can we do a little reorganization of the file, a la AIP-191

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn't do a 1:1 conversion of v1 endpoints to v2 and instead treat topics/partitions as resources to be operated on.

Have some sort of Partition or Topic service that provides different services to control the ntp?

@rockwotj not sure if you have thoughts on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn't do a 1:1 conversion of v1 endpoints to v2 and instead treat topics/partitions as resources to be operated on.

Appreciate the thorough review, I tasked Claude with a direct port of the existing endpoint, but I'm happy to make whatever improvements we would benefit from and include them in this migration. And obviously carry forward those improvements/learnings to future migrations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We (well I) do not want a direct port. Let me port a couple as an example when I am back and we can go from there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We (well I) do not want a direct port.

thanks, this is helpful context for these ports that I didn't realize we were also cleaning them up but +1.

message PartitionCloudStorageStatus {
// The partition's cloud storage mode (one of: disabled, write_only,
// read_only, full and read_replica).
string cloud_storage_mode = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enum?

Comment on lines +272 to +278
// Delta in milliseconds since the last upload of the partition's manifest.
optional int64 ms_since_last_manifest_upload = 2;
// Delta in milliseconds since the last segment upload for the partition.
optional int64 ms_since_last_segment_upload = 3;
// Delta in milliseconds since the last manifest sync (only present for read
// replicas).
optional int64 ms_since_last_manifest_sync = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duration?

Comment on lines +342 to +343
// Lifecycle status.
string status = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enum?

Comment on lines +106 to +115
// DeleteCloudStorageLifecycle
//
// Forcibly drop a lifecycle marker for a topic, this may leave data behind
// in the tiered storage bucket.
rpc DeleteCloudStorageLifecycle(DeleteCloudStorageLifecycleRequest)
returns (DeleteCloudStorageLifecycleResponse) {
option (pbgen.rpc) = {
authz: SUPERUSER
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be treated internal and not documented?

Comment on lines +417 to +418
// Type of anomaly.
string type = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enum?

@dotnwat dotnwat dismissed their stale review January 9, 2026 16:19

cleaning up service api too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants