feat: add DevOps diagnostics API, CLI commands, and troubleshooting s…#572
Merged
ninan-nn merged 5 commits intoalibaba:mainfrom Mar 27, 2026
Merged
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6831e36b4f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ninan-nn
reviewed
Mar 25, 2026
ninan-nn
reviewed
Mar 25, 2026
ninan-nn
reviewed
Mar 25, 2026
ninan-nn
reviewed
Mar 25, 2026
Collaborator
|
Great Jobs. Perhaps you could consider separating different DevOps implementations into independent files at the implementation level, such as |
95b6cac to
ae6a45a
Compare
…kill - Server: add diagnostics endpoints (logs, inspect, events, summary) returning text/plain for both Docker and Kubernetes backends - Extract diagnostics methods into docker_diagnostics.py and k8s_diagnostics.py - CLI: add `opensandbox devops` command group with correct OPEN-SANDBOX-API-KEY header - Skill: add troubleshoot-sandbox skill with structured workflow and common failure pattern reference (OOM, CrashLoopBackOff, ImagePullBackOff, etc.) - Docs: add devops-diagnostics-example.md with OOM kill walkthrough - Update CLI README with devops commands and screenshots Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reuse SandboxService instance from lifecycle.py instead of creating a duplicate (avoids double timer registration in Docker backend) - Let HTTPException propagate in summary endpoint so 404 is returned correctly instead of being swallowed into a 200 with [error] text - Remove try/except error swallowing in Docker and K8s diagnostics backends so failures surface as proper HTTP errors - Fix Docker --since: convert relative duration to absolute Unix timestamp (Docker treats integer since as epoch, not relative seconds) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4073d75 to
2e3e307
Compare
Move skill files from .claude/skills/ to skills/ at project root so they serve as distributable assets for all AI coding tools, not just Claude Code. Add `osb skills install/list/uninstall` CLI commands supporting 6 targets: Claude Code, Cursor, Codex, GitHub Copilot, Windsurf, and Cline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Collaborator
|
Plz make sure passing cli quality check : > |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ninan-nn
approved these changes
Mar 27, 2026
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why: Community users lack self-service debugging capabilities when sandbox issues occur (creation failures, crashes, network problems). Currently only operators can troubleshoot by manually running
docker logs,kubectl describe, etc. This PR enables users (especially AI agents) to diagnose problems independently through a Server DevOps API → CLI → Skill pipeline.What:
GET /sandboxes/{id}/diagnostics/{logs,inspect,events,summary}), all endpoints returntext/plain. Implemented for both Docker and Kubernetes backends.opensandbox devopscommand group (logs,inspect,events,summary) that calls the DevOps API and prints plain text output..claude/skills/opensandbox-troubleshoot.mdwith a structured troubleshooting workflow and a quick-reference table covering 10 common failure scenarios (OOM, ImagePullBackOff, CrashLoopBackOff, etc.).Testing
Breaking Changes
Checklist