Summary
Valkey restore post-ready / Sentinel registration could fail because post-restore-sentinel.sh could not reliably recover target context or discover the current primary in restore jobs.
This issue tracks the addon-side stop points that were reproduced, narrowed with evidence, and then fixed in post-restore-sentinel.sh.
Reproduced stop points
1. Single-shot primary discovery was too rigid
post-ready step 1 could fail with:
could not find primary among data pods — Sentinel registration failed
2. Restore job context variables were absent
In post-ready jobs:
DP_TARGET_POD_NAME was empty
DP_TARGET_NAMESPACE was empty
3. DP_DB_HOST could be only pod.headless-service
The fallback path executed, but DP_DB_HOST did not always include namespace / cluster domain, so target context still could not be reconstructed.
Fix scope
The validated addon fix stays in:
addons/valkey-bk/dataprotection/post-restore-sentinel.sh
It includes three minimal changes:
- bounded retry for primary discovery
- fallback from
DP_TARGET_* to DP_DB_HOST
- namespace fallback from the current serviceaccount namespace when
DP_DB_HOST is only pod.headless-service
Validation boundary
This issue covers the current Valkey restore post-ready / Sentinel registration stop point.
It does not claim that all restore scenarios are fully passed.
Validation evidence
On the targeted rerun after the third patch:
postReady-0 = Completed
postReady-1 = Completed
- restore reached
phase=Completed
type=PostReady -> True / Succeed
- logs contained:
resolved target context ...
current primary is ...
Post-restore Sentinel registration complete.
Summary
Valkey restore
post-ready / Sentinel registrationcould fail becausepost-restore-sentinel.shcould not reliably recover target context or discover the current primary in restore jobs.This issue tracks the addon-side stop points that were reproduced, narrowed with evidence, and then fixed in
post-restore-sentinel.sh.Reproduced stop points
1. Single-shot primary discovery was too rigid
post-ready step 1could fail with:could not find primary among data pods — Sentinel registration failed2. Restore job context variables were absent
In
post-readyjobs:DP_TARGET_POD_NAMEwas emptyDP_TARGET_NAMESPACEwas empty3.
DP_DB_HOSTcould be onlypod.headless-serviceThe fallback path executed, but
DP_DB_HOSTdid not always include namespace / cluster domain, so target context still could not be reconstructed.Fix scope
The validated addon fix stays in:
addons/valkey-bk/dataprotection/post-restore-sentinel.shIt includes three minimal changes:
DP_TARGET_*toDP_DB_HOSTDP_DB_HOSTis onlypod.headless-serviceValidation boundary
This issue covers the current Valkey restore
post-ready / Sentinel registrationstop point.It does not claim that all restore scenarios are fully passed.
Validation evidence
On the targeted rerun after the third patch:
postReady-0=CompletedpostReady-1=Completedphase=Completedtype=PostReady -> True / Succeedresolved target context ...current primary is ...Post-restore Sentinel registration complete.