Skip to content

feat: add DevOps diagnostics API, CLI commands, and troubleshooting s…#572

Merged
ninan-nn merged 5 commits intoalibaba:mainfrom
hellomypastor:feat/devops-diagnostics-api
Mar 27, 2026
Merged

feat: add DevOps diagnostics API, CLI commands, and troubleshooting s…#572
ninan-nn merged 5 commits intoalibaba:mainfrom
hellomypastor:feat/devops-diagnostics-api

Conversation

@hellomypastor
Copy link
Copy Markdown
Contributor

Summary

  • Why: Community users lack self-service debugging capabilities when sandbox issues occur (creation failures, crashes, network problems). Currently only operators can troubleshoot by manually running docker logs, kubectl describe, etc. This PR enables users (especially AI agents) to diagnose problems independently through a Server DevOps API → CLI → Skill pipeline.

  • What:

    • Server: Add DevOps diagnostics API (GET /sandboxes/{id}/diagnostics/{logs,inspect,events,summary}), all endpoints return text/plain. Implemented for both Docker and Kubernetes backends.
    • CLI: Add opensandbox devops command group (logs, inspect, events, summary) that calls the DevOps API and prints plain text output.
    • Skill: Add .claude/skills/opensandbox-troubleshoot.md with a structured troubleshooting workflow and a quick-reference table covering 10 common failure scenarios (OOM, ImagePullBackOff, CrashLoopBackOff, etc.).

Testing

  • e2e / manual verification
  • Unit tests (follow-up: add unit tests for diagnostics service methods)
  • Integration tests (follow-up: add integration tests with Docker backend)

Breaking Changes

  • None

Checklist

  • Linked Issue or clearly described motivation
  • Added/updated docs (if needed) — troubleshooting skill serves as user-facing documentation
  • Added/updated tests (if needed) — follow-up PR
  • Security impact considered — sensitive env vars (SECRET/TOKEN/PASSWORD/KEY) are masked in inspect output
  • Backward compatibility considered — purely additive, no existing API or CLI behavior changed

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6831e36b4f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Collaborator

@ninan-nn ninan-nn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait #569 merged first.

@Pangjiping
Copy link
Copy Markdown
Collaborator

Great Jobs. Perhaps you could consider separating different DevOps implementations into independent files at the implementation level, such as docker_devops.py and kubernetes_devops.py.

@hellomypastor hellomypastor force-pushed the feat/devops-diagnostics-api branch from 95b6cac to ae6a45a Compare March 25, 2026 14:50
hellomypastor and others added 2 commits March 26, 2026 18:18
…kill

- Server: add diagnostics endpoints (logs, inspect, events, summary)
  returning text/plain for both Docker and Kubernetes backends
- Extract diagnostics methods into docker_diagnostics.py and k8s_diagnostics.py
- CLI: add `opensandbox devops` command group with correct OPEN-SANDBOX-API-KEY header
- Skill: add troubleshoot-sandbox skill with structured workflow and
  common failure pattern reference (OOM, CrashLoopBackOff, ImagePullBackOff, etc.)
- Docs: add devops-diagnostics-example.md with OOM kill walkthrough
- Update CLI README with devops commands and screenshots

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reuse SandboxService instance from lifecycle.py instead of creating
  a duplicate (avoids double timer registration in Docker backend)
- Let HTTPException propagate in summary endpoint so 404 is returned
  correctly instead of being swallowed into a 200 with [error] text
- Remove try/except error swallowing in Docker and K8s diagnostics
  backends so failures surface as proper HTTP errors
- Fix Docker --since: convert relative duration to absolute Unix
  timestamp (Docker treats integer since as epoch, not relative seconds)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@hellomypastor hellomypastor force-pushed the feat/devops-diagnostics-api branch from 4073d75 to 2e3e307 Compare March 26, 2026 10:18
ninan-nn
ninan-nn previously approved these changes Mar 26, 2026
Copy link
Copy Markdown
Collaborator

@ninan-nn ninan-nn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Move skill files from .claude/skills/ to skills/ at project root so they
serve as distributable assets for all AI coding tools, not just Claude Code.

Add `osb skills install/list/uninstall` CLI commands supporting 6 targets:
Claude Code, Cursor, Codex, GitHub Copilot, Windsurf, and Cline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ninan-nn
Copy link
Copy Markdown
Collaborator

ninan-nn commented Mar 27, 2026

Plz make sure passing cli quality check : >

Pangjiping
Pangjiping previously approved these changes Mar 27, 2026
Copy link
Copy Markdown
Collaborator

@Pangjiping Pangjiping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ninan-nn ninan-nn merged commit 3ff3c81 into alibaba:main Mar 27, 2026
24 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants