-
Notifications
You must be signed in to change notification settings - Fork 0
Jrepp - Drain On Shutdown #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
User request: "prism-proxy needs to add drain functionality - when told to stop by the prism-admin it needs to go into a state where it is draining connections. first it should drain and then deny new front end connections, then it should wait for all backend work to be complete that was attached to front end work - the stop signal to the runners should also follow a similar style of shutdown, they should be told that the system is stopping, they will continue to process current requests but will not accept new requests, when all pending work is completed they should exit, when the proxy front end connections have exited and all pattern processes have exited the proxy should cleanly exit" Implemented comprehensive drain-on-shutdown behavior with coordinated proxy and pattern lifecycle: **Protobuf Changes**: - Extended lifecycle.proto with DrainRequest/DrainResponse messages - Added Drain RPC to LifecycleInterface service - Updated proxy_control_plane.proto to include DrainRequest in ProxyCommand **Proxy Implementation** (Rust): - Added DrainState enum (Running, Draining, Stopping) to ProxyServer - Implemented drain_and_shutdown() with 5-phase sequence: 1. Enter drain mode (reject new connections) 2. Signal pattern runners to drain 3. Wait for frontend connections to complete (with timeout) 4. Stop pattern runners 5. Shutdown gRPC server - Added active_connections atomic counter for connection tracking - Updated main.rs to use drain_and_shutdown with 30s default timeout **Pattern Manager** (Rust): - Added drain_all_patterns() and stop_all_patterns() methods - Implemented drain_pattern() to signal individual patterns - Added PatternClient::drain() for gRPC drain calls **Go Plugin SDK**: - Extended Plugin interface with Drain() method - Added DrainMetrics struct (drained_operations, aborted_operations) - Implemented Drain RPC in LifecycleService - Updated memstore and redis drivers with Drain() implementations **Pattern Runner**: - Added Drain() to KeyValuePluginAdapter - Delegates drain to underlying backend drivers - Proper error handling and logging **Router Changes**: - Made pattern_manager field public for drain coordination **Key Features**: - Zero data loss: All in-flight operations complete before shutdown - Configurable timeouts: Default 30s, environment variable override - Clear state transitions with logged indicators - Kubernetes-ready for graceful rolling updates All binaries compile successfully. Ready for local binary testing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Brings in GitHub merge queue support (#1) from main branch. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…Drain() User request: "address the priority actions" Fixed three categories of CI failures: 1. **Rust Formatting** (cargo fmt): - Fixed drain_all_patterns() function signature formatting - Fixed drain_pattern() method signature formatting - Fixed stop_all_patterns().await chain formatting - All formatting now passes cargo fmt --check 2. **Documentation Validation** (ADR-058 frontmatter): - Fixed id: 058 → adr-058 (lowercase format required) - Fixed status: accepted → Accepted (capitalization) - Added missing required fields: deciders, project_id, doc_uuid - Changed tags from array to YAML list format - All 126 documents now validate successfully 3. **NATS Driver Drain Implementation**: - Added Drain() method to NATSPattern - Delegates to built-in NATS connection drain during Stop() - Returns DrainMetrics with zero operations (handled by library) - NATS driver tests now pass Unit test results: - ✅ pkg/drivers/nats: All tests pass - ✅ patterns/consumer: All tests pass - ✅ patterns/producer: All tests pass Documentation validation: - ✅ 126 documents scanned - ✅ 381 links valid - ✅ 0 errors Next: Investigate acceptance test failures in CI 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…ule dependency - Add Drain() method to S3Driver to implement plugin.Plugin interface - Add replace directive for redis driver in producer pattern go.mod to use local version instead of cached remote version - Both changes fix compilation errors in acceptance tests and pattern tests Fixes the following CI test failures: - Test Producer Pattern - Test KeyValue Acceptance - Test Consumer Acceptance - Test Producer Acceptance - Test ClaimCheck Acceptance - Test Unified Acceptance User request: "fix issues with jrepp/drain-on-shutdown so the pr passes" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Resolved conflicts in generated documentation files (docs/404.html, docs/index.html, docs/sitemap.xml) by accepting deletion from main branch. Brings in latest changes from main including: - Mailbox pattern implementation (RFC-037) - prism-admin SQLite storage (ADR-054) - Control plane improvements - New SQLite driver User request: "resolve conflicts to merge this PR" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Add Drain() method to MemPubSub driver (pkg/drivers/memstore/pubsub.go) - Add Drain() method to SQLite driver (pkg/drivers/sqlite/sqlite.go) - Add replace directive for memstore in mailbox pattern go.mod These drivers were added in main and needed the Drain() method to implement the updated plugin.Plugin interface. All follow the same pattern as other synchronous drivers (return empty DrainMetrics since operations complete immediately). User request: "resolve conflicts to merge this PR" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Pattern Adapters: - ConsumerPluginAdapter: Allow in-flight message processing to complete - MulticastRegistryPluginAdapter: Allow atomic operations to complete Drivers: - PostgresPlugin: Connection pool drains automatically before shutdown - KafkaPlugin: Flush pending producer messages with timeout, track aborted ops All implementations now complete the plugin.Plugin interface with proper drain semantics to support graceful shutdown. Fixes CI test failures: - TestConsumerProcessBased/ConsumerProcess-NATS-MemStore - TestConsumerProcessBased/ConsumerProcess-NATS-MemStore-DLQ - Any tests using Postgres or Kafka drivers User request: "resolve conflicts to merge this PR" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Comprehensive guide for local Kubernetes setup with k3d as recommended modern installer. Key content: - Comparison of k3d vs kind/minikube/Docker Desktop/k0s - Complete prerequisites and installation - Quick start multi-node cluster creation - Full Prism architecture deployment (7 phases) - Backing services (Redis, NATS, Postgres, MinIO, Kafka) - Prism components (admin, proxy, pattern runners) - Ingress configuration with Nginx - Observability stack (Prometheus/Grafana) - Resource limits and HPA - Troubleshooting guide - CI/CD integration examples User request: "write a memo of how to make a local docker kubernetes installation with the service components, what is the best modern 'installer' for k8s?" Co-Authored-By: Claude <[email protected]>
Comprehensive ADR proposing Kubernetes Operator with CRDs for declarative, flexible Prism cluster management. Core features: - PrismCluster CRD (v1alpha1) defines entire stack in single YAML - Controller reconciliation with 7-phase loop (backends → admin → proxy → patterns → ingress → status) - Runtime flexibility: add/remove patterns, scale components, enable backends via kubectl patch - Self-healing with automatic recreation of failed components - Automatic dependency management (Redis before keyvalue-runner, NATS before consumer-runner) - Service discovery with automatic ConfigMap generation - Rich status reporting (phase, component health, conditions) Technology: - Kubebuilder framework (vs Operator SDK) - 8-week implementation plan - Complete Go controller code examples - Comparison with Helm, Manual YAML (MEMO-035), Kustomize MEMO-035 updated: - Added "Alternative: Kubernetes Operator Deployment" section - Operator vs Manual YAML comparison table - Quick operator example with runtime flexibility demos - When to use operator vs manual YAML guidance - Migration path from manual YAML to operator Documentation validated with all 130 docs passing. User request: "i want this installation to be flexible at install time and runtime, would it be possible to use a kind of CRD and a controller to orchestrate the components?" Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds documentation for Go tooling and CLI utilities through several Architecture Decision Records (ADRs). The PR establishes Go as the language of choice for Prism's developer tooling, CLI utilities, and data migration tools, complementing the existing Rust proxy and Python orchestration components.
Key Changes:
- Established Go as the standard language for CLI tools and utilities (ADR-012)
- Defined comprehensive error handling, concurrency, testing, configuration management, and logging patterns for Go code (ADRs 013-017)
Reviewed Changes
Copilot reviewed 17 out of 1125 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| docs/adr/adr-012/index.html | Documents the decision to use Go for tooling and CLI utilities |
| docs/adr/adr-013/index.html | Defines Go error handling strategy using wrapped errors |
| docs/adr/adr-014/index.html | Establishes Go concurrency patterns using fork-join model |
| docs/adr/adr-015/index.html | Outlines Go testing strategy with three-tier approach |
| docs/adr/adr-016/index.html | Specifies CLI and configuration management using Cobra and Viper |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
This PR is very large (XL). Consider breaking it into smaller, more reviewable PRs. |
|
This PR has merge conflicts with the base branch. Please resolve them. |
User request: "look at all local branches for unmerged commits, create PRs if they are found by first merging origin/main and submitting the commit data"
This branch contains 9 unmerged commit(s). Conflicts resolved automatically with aggressive strategy.
Co-Authored-By: Claude [email protected]