Skip to content

valkey restore post-ready: recover target context and sentinel registration in post-restore-sentinel.sh #2595

@weicao

Description

@weicao

Summary

Valkey restore post-ready / Sentinel registration could fail because post-restore-sentinel.sh could not reliably recover target context or discover the current primary in restore jobs.

This issue tracks the addon-side stop points that were reproduced, narrowed with evidence, and then fixed in post-restore-sentinel.sh.

Reproduced stop points

1. Single-shot primary discovery was too rigid

post-ready step 1 could fail with:

  • could not find primary among data pods — Sentinel registration failed

2. Restore job context variables were absent

In post-ready jobs:

  • DP_TARGET_POD_NAME was empty
  • DP_TARGET_NAMESPACE was empty

3. DP_DB_HOST could be only pod.headless-service

The fallback path executed, but DP_DB_HOST did not always include namespace / cluster domain, so target context still could not be reconstructed.

Fix scope

The validated addon fix stays in:

  • addons/valkey-bk/dataprotection/post-restore-sentinel.sh

It includes three minimal changes:

  1. bounded retry for primary discovery
  2. fallback from DP_TARGET_* to DP_DB_HOST
  3. namespace fallback from the current serviceaccount namespace when DP_DB_HOST is only pod.headless-service

Validation boundary

This issue covers the current Valkey restore post-ready / Sentinel registration stop point.
It does not claim that all restore scenarios are fully passed.

Validation evidence

On the targeted rerun after the third patch:

  • postReady-0 = Completed
  • postReady-1 = Completed
  • restore reached phase=Completed
  • type=PostReady -> True / Succeed
  • logs contained:
    • resolved target context ...
    • current primary is ...
    • Post-restore Sentinel registration complete.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions