Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trap snapshotting failure and allow AEM to recover #33

Open
mbloch1986 opened this issue Dec 11, 2018 · 1 comment
Open

Trap snapshotting failure and allow AEM to recover #33

mbloch1986 opened this issue Dec 11, 2018 · 1 comment
Assignees

Comments

@mbloch1986
Copy link
Contributor

If the Offline-Snapshot failed and a aem instance is alrady stopped the service doesn't get started automatically.

https://github.com/shinesolutions/aem-stack-manager-cloud/blob/master/lambda/aem_offline_snapshot.py#L880

@cliffano cliffano changed the title Offline-Snapshot: Start AEM service if failed Trap snapshotting failure and allow AEM to recover Jul 24, 2019
@cliffano
Copy link
Contributor

When offline snapshotting fails due to random issue such as connectivity error, or timeouts, etc, the whole process fails and it leaves the environment at a state where AEM is stopped and this interrupts the state of the whole AEM environment.

To be less disruptive, we should trap the snapshotting failure, but still allow AEM to start again. This will result in a potentially missing snapshot, e.g. an offline snapshot might have author snapshot but not the pair publish snapshot. However, it would be a better outcome compared to how it currently is where AEM will be left at a stopped state causing either of the following;

  • a stopped Author Primary which causes Author-Dispatcher to keep recycling
  • a stopped Author Standby which causes risk of losing the ability to promote a standby to primary
  • a stopped Publish which causes a pair of Publish and Publish-Dispatcher to be recycled

One key thing to trapping this failure is that the error should still be visible and the event is sent to SNS topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants