Skip to content

Commit 43c128d

Browse files
authored
dcos: recovery (#81)
1 parent c0d3cf6 commit 43c128d

File tree

12 files changed

+321
-1349
lines changed

12 files changed

+321
-1349
lines changed

docs/change-units/anatomy-and-structure.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,30 @@ Controls whether the change runs within a transaction (default: `true`).
8585

8686
**Important:** For non-transactional target systems (S3, Kafka, etc.), this flag has no effect.
8787

88+
### `recovery` - Failure handling strategy
89+
Controls how Flamingock handles execution failures (default: `MANUAL_INTERVENTION`).
90+
91+
```java
92+
// Default behavior (manual intervention)
93+
@ChangeUnit(id = "critical-change", order = "0001", author = "team")
94+
public class CriticalChange {
95+
// Execution stops on failure, requires manual resolution
96+
}
97+
98+
// Automatic retry
99+
@Recovery(strategy = RecoveryStrategy.ALWAYS_RETRY)
100+
@ChangeUnit(id = "idempotent-change", order = "0002", author = "team")
101+
public class IdempotentChange {
102+
// Automatically retries on failure until successful
103+
}
104+
```
105+
106+
**Recovery strategies:**
107+
- `MANUAL_INTERVENTION` (default): Stops execution on failure, requires CLI resolution
108+
- `ALWAYS_RETRY`: Automatically retries on subsequent executions until successful
109+
110+
For detailed information on recovery strategies, see [Safety and Recovery](../safety-and-recovery/introduction.md).
111+
88112
## Required annotations
89113

90114
### `@TargetSystem` - System specification

docs/change-units/introduction.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,9 @@ rollback: "ALTER TABLE orders DROP COLUMN status;"
6262
6363
## Safety and recovery
6464
65-
Flamingock prioritizes safety over automation. If execution fails or results are uncertain, Flamingock stops and requires manual intervention rather than risking data corruption. This ensures you always know the exact state of your systems.
65+
While ChangeUnit executions typically complete successfully, Flamingock provides configurable recovery strategies to handle any exceptional circumstances that may arise. If results are uncertain, Flamingock stops and requires manual intervention rather than risking data corruption, ensuring you always know the exact state of your systems.
66+
67+
You can configure different recovery strategies based on your requirements. For complete details on failure handling and recovery workflows, see [Safety and Recovery](../safety-and-recovery/introduction.md).
6668
6769
## Next steps
6870

docs/cli/cli.md

Lines changed: 3 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,14 @@ sidebar_position: 999
55

66
# Flamingock CLI
77

8-
Management tool for audit control and issue resolution in distributed system evolution.
8+
Command-line tool for audit management and maintenance operations.
99

1010
> **Beta Release**
1111
> This is the beta version of Flamingock CLI, providing essential management operations for audit control and issue resolution. A more comprehensive CLI with full migration execution capabilities is in development.
1212
1313
## Overview
1414

15-
The Flamingock CLI is a lightweight management tool that helps you maintain consistency and resolve issues in your distributed system changes. When migrations fail or get interrupted, the CLI provides the operational control needed to investigate, understand, and resolve these issues.
15+
The Flamingock CLI provides operational commands for audit management and maintenance. Use these commands to view audit history, identify issues, and perform resolution operations.
1616

1717
## Installation
1818

@@ -134,36 +134,7 @@ If the change was not applied or rolled back:
134134
flamingock audit fix -c user-migration-v2 -r ROLLED_BACK
135135
```
136136

137-
## Understanding Issues
138-
139-
An issue occurs when a change didn't complete properly, such as:
140-
- The change was interrupted during execution (network failure, server restart, etc.)
141-
- The change failed but the failure wasn't properly recorded
142-
- The system crashed while applying the change
143-
144-
### Resolution Process
145-
146-
1. **Identify the issue**
147-
```bash
148-
flamingock issue list
149-
```
150-
151-
2. **Get detailed information**
152-
```bash
153-
flamingock issue get -c <change-id> --guidance
154-
```
155-
156-
3. **Verify actual state** in your target system (database, service, etc.)
157-
158-
4. **Fix the audit state** based on your findings:
159-
- If the change was successfully applied despite the audit failure:
160-
```bash
161-
flamingock audit fix -c <change-id> -r APPLIED
162-
```
163-
- If the change was not applied or you manually rolled it back:
164-
```bash
165-
flamingock audit fix -c <change-id> -r ROLLED_BACK
166-
```
137+
For detailed workflows on issue resolution, see [Issue resolution](../safety-and-recovery/issue-resolution.md).
167138

168139
## Command Reference
169140

docs/recovery-and-safety/_category_.json

Lines changed: 0 additions & 8 deletions
This file was deleted.

0 commit comments

Comments
 (0)