You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When offline snapshotting fails due to random issue such as connectivity error, or timeouts, etc, the whole process fails and it leaves the environment at a state where AEM is stopped and this interrupts the state of the whole AEM environment.
To be less disruptive, we should trap the snapshotting failure, but still allow AEM to start again. This will result in a potentially missing snapshot, e.g. an offline snapshot might have author snapshot but not the pair publish snapshot. However, it would be a better outcome compared to how it currently is where AEM will be left at a stopped state causing either of the following;
a stopped Author Primary which causes Author-Dispatcher to keep recycling
a stopped Author Standby which causes risk of losing the ability to promote a standby to primary
a stopped Publish which causes a pair of Publish and Publish-Dispatcher to be recycled
One key thing to trapping this failure is that the error should still be visible and the event is sent to SNS topic.
If the Offline-Snapshot failed and a aem instance is alrady stopped the service doesn't get started automatically.
https://github.com/shinesolutions/aem-stack-manager-cloud/blob/master/lambda/aem_offline_snapshot.py#L880
The text was updated successfully, but these errors were encountered: