Draft fix for the OADP 4289 - to be tested before proposed upstream#331

Closed

mpryc wants to merge 43 commits intoopenshift:oadp-1.4from

mpryc:fix_for_oadp_4289

mpryc commented Jul 25, 2024 •

edited by weshayutin

Loading

Draft pull request
http://quay.io/migi/velero:fix_oadp_4289

openshift-ci Bot added the do-not-merge/work-in-progress label

openshift-ci Bot commented Jul 25, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Author

mpryc commented Jul 25, 2024

@sseago @shubham-pampattiwar @weshayutin if you could comment if it's proper fix.

openshift-ci Bot commented Jul 25, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mpryc

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

DOWNSTREAM_OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mpryc force-pushed the fix_for_oadp_4289 branch from c756d33 to def0083 Compare

July 25, 2024 19:00

sseago reviewed

View reviewed changes

pkg/restore/restore.go Outdated

pkg/restore/restore.go Outdated

pkg/restore/restore.go Outdated

sseago reviewed

View reviewed changes

pkg/restore/restore.go Outdated

pkg/restore/restore.go Outdated

Author

mpryc commented Aug 1, 2024

@sseago I've revisited that block of code and reworked it to be much simpler without so many if-else statements. Also checking if the error is type of conflict is unnecessary as the function RetryOnConflict won't retry on other errors.

If you could comment on current implementation.

mpryc force-pushed the fix_for_oadp_4289 branch from 3234e26 to 25a069f Compare

August 1, 2024 07:45

sseago reviewed

View reviewed changes

sseago left a comment

The updated workflow is a lot clearer -- only one patch attempt each time the func is called. The only problems I see now are 1) handling of the obj we're adding managed fields for prior to generating patch, and 2) we shouldn't error our for NotFound errors, to replicate current behavior (where the NotFound logic was added a few releases ago to handle an edge case that a user ran into). Comments inline.

pkg/restore/restore.go

pkg/restore/restore.go

pkg/restore/restore.go

pkg/restore/restore.go Outdated

               		} else {
-              			ctx.log.Infof("the managed fields for %s is patched", kube.NamespaceAndName(obj))
+              			withoutManagedFields = createdObj.DeepCopy()

sseago Aug 1, 2024

withManagedFields = createdObj
(I don't think we need deepcopy here, since this is the last use of createdObj)

Author

mpryc Aug 1, 2024

The createdObj is used later, so I wonder if we need to do some assignment so createdObj becomes updated with managed fields in a situation we get one fresh from cluster ?

sseago Aug 1, 2024

Oh. Actually, in that case, a simpler change. Ignore the suggestion to create withManagedFields and instead, above we can set createdObj to the result of resourceClient.Get and omit the else clause completely. We'll do the deepCopy outside of the if block.

Author

mpryc Aug 19, 2024

Should be fixed.

pkg/restore/restore.go

pkg/restore/restore.go Outdated

pkg/restore/restore.go

+              	if err != nil {
+              		ctx.log.Errorf("error patching managed fields %s: %v", kube.NamespaceAndName(obj), err)
+              		errs.Add(namespace, err)

sseago Aug 1, 2024

I don't think we need the log.Errorf here, since we already logged the error inside the retry func. Also, we need to add the if block around errs.Add and return here, since we don't error out if it's a NotFound error:

	if !apierrors.IsNotFound(err) {
		errs.Add(namespace, err)
		return warnings, errs, itemExists
	}

Author

mpryc Aug 19, 2024

For that I think we need to also include:

	if err != nil && !apierrors.IsNotFound(err) {
		errs.Add(namespace, err)
		return warnings, errs, itemExists
	}

Author

mpryc Aug 19, 2024

Should be fixed now.

allenxu404 and others added 20 commits

August 1, 2024 11:45


          Ping golang/distroless image to latest version (velero-io#6679)

c2a392e

Signed-off-by: allenxu404 <qix2@vmware.com>


          skip subresource in resource discovery (velero-io#6688)

28bdd2d

Signed-off-by: lou <alex1988@outlook.com>
Co-authored-by: lou <alex1988@outlook.com>


          fix issue 6753

e07aff8

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>


          Update restore controller logic for restore deletion (velero-io#6761)

008cf6f

1. Skip deleting the restore files from storage if the backup/BSL is not found
2. Allow deleting the restore files from storage even though the BSL is readonly

Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>


          Fix velero-io#6752: add namespace exclude check.

2d51dd9

Add PSA audit and warn labels.

Signed-off-by: Xun Jiang <jxun@vmware.com>


          add csi snapshot data movement doc

b87a2a7

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>


          Modify changelogs for v1.12

17311ab

Signed-off-by: allenxu404 <qix2@vmware.com>


          issue 6786:always delete VSC regardless of the deletion policy

60b47f6

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>


          issue: move plugin depdending podvolume functions to util pkg

e4fbb84

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>


          issue 6880: set ParallelUploadAboveSize as MaxInt64

d84ed70

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>


          changelog

87875c4

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>


          Add support for block volumes (velero-io#6680) (velero-io#6897)

e850729

(cherry picked from commit 8e01d1b)

Signed-off-by: David Zaninovic <dzaninovic@catalogicsoftware.com>


          Replace the base image with paketobuildpacks image

bb94bfe

Replace the base image with paketobuildpacks image

Fixes velero-io#6851

Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>


          issue 6734: spread backup pod evenly

fc75e3b

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>


          Add doc links for new features to release note

a387e2a

Signed-off-by: allenxu404 <qix2@vmware.com>


          fix issue 6647

578f715

Signed-off-by: Lyndon-Li <lyonghui@vmware.com>


          Perf improvements for existing resource restore

473bd75

Use informer cache with dynamic client for Get calls on restore
When enabled, also make the Get call before create.

Add server and install parameter to allow disabling this feature,
but enable by default

Signed-off-by: Scott Seago <sseago@redhat.com>


          issue velero-io#6807: Retry failed create when using generateName

64b134f

When creating resources with generateName, apimachinery
does not guarantee uniqueness when it appends the random
suffix to the generateName stub, so if it fails with
already exists error, we need to retry.

Signed-off-by: Scott Seago <sseago@redhat.com>


          Import auth provider plugins

42ebd46

Signed-off-by: Sebastian Glab <sglab@catalogicsoftware.com>


          Add v1.12.1 changelog

ddda08c

Signed-off-by: allenxu404 <qix2@vmware.com>

sseago and others added 4 commits

August 1, 2024 11:45


          OADP-4225: add tzdata to Dockerfile.ubi

66bbdf6


          release-string for 1.14.0 (openshift#318)

7a6b3c5


          fix: CI

216602c

Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>


          oadp-1.4: OADP-3227: Reconcile to fail on restore stuck in-progress (o…

3dca118

…penshift#330)

* oadp-1.4: OADP-3227: Mark InProgress backup/restore as failed upon requeuing (openshift#315)

* Mark InProgress backup/restore as failed upon requeuing

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

remove uuid, return err to requeue instead of requeue: true

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

* cleanup to minimize diff from upstream

Signed-off-by: Scott Seago <sseago@redhat.com>

* error message update

Signed-off-by: Scott Seago <sseago@redhat.com>

* requeue on finalize status update.

Unlike the InProgress transition, there's no need to fail here,
since the Finalize steps can be repeated.

* Only run patch once for all backup finalizer return scenarios

Signed-off-by: Scott Seago <sseago@redhat.com>

---------

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Signed-off-by: Scott Seago <sseago@redhat.com>
Co-authored-by: Scott Seago <sseago@redhat.com>

* oadp-1.4: OADP-3227: Reconcile To Fail: Add backup/restore trackers (openshift#324)

* OADP-4265: Reconcile To Fail: Add backup/restore trackers

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

* Apply suggestions from code review: backupTracker

* Address restoreTracker feedback

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

* s/delete from/add to/ in the comment

* unit test fix

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

* backup_controller unit test

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

* restore_controller unit test

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

* `make update`

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

* mock patch to fail failure due to connection refused

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

---------

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

* regenerate mocks

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

---------

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Signed-off-by: Scott Seago <sseago@redhat.com>
Co-authored-by: Scott Seago <sseago@redhat.com>

sseago force-pushed the oadp-1.4 branch from c035379 to 3dca118 Compare

August 1, 2024 16:04

sseago reviewed

View reviewed changes

pkg/restore/restore.go

-              			ctx.log.Infof("the managed fields for %s is patched", kube.NamespaceAndName(obj))
+              			withoutManagedFields = createdObj.DeepCopy()
+              		}

sseago Aug 1, 2024

Add withoutManagedFields = createdObj.DeepCopy() here, right before calling setManagedFields.

Author

mpryc Aug 19, 2024

Yeah that logic is much cleaner. thanks.

sseago reviewed

View reviewed changes

pkg/restore/restore.go Outdated


          fix: ARM images (openshift#332) (openshift#335)

ffa154a

* fix: ARM images

Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>

* fixup! fix: ARM images

Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>

---------

Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
(cherry picked from commit 41ad312)

openshift-merge-robot added the needs-rebase label

mpryc and others added 6 commits

August 13, 2024 15:22


          OADP-4640: Downstream only to allow override kopia default algorithms (…

48b7ec0

…openshift#334)

Introduction of downstream only option to override Kopia default:
 - hashing algorithm
 - splitting algorithm
 - encryption algorithm

With introduction of 3 environment variables it is possible to override
Kopia algorithms used by Velero:

KOPIA_HASHING_ALGORITHM
KOPIA_SPLITTER_ALGORITHM
KOPIA_ENCRYPTION_ALGORITHM

If the env algorithms are not set or they are not within
Kopia SupportedAlgorithms, the default algorithm will be used.
This behavior is consistent with current behavior without this
change.

Signed-off-by: Michal Pryc <mpryc@redhat.com>


          add missing unit test for kopia hashing algo (openshift#337)

6888e15


          Draft fix for the OADP 4289 - to be tested before proposed upstream

1406dff

Signed-off-by: Michal Pryc <mpryc@redhat.com>


          Modifications after Scott suggestions.

718cd7d

Signed-off-by: Michal Pryc <mpryc@redhat.com>


          Revisited function for patching managed fields.

fe76f94

Signed-off-by: Michal Pryc <mpryc@redhat.com>


          Further updates to the logic for properly restoring managed fields

74a4f2f

Additional changes to the logic for restoring managed fields.

Signed-off-by: Michal Pryc <mpryc@redhat.com>

mpryc force-pushed the fix_for_oadp_4289 branch from 25a069f to 74a4f2f Compare

August 19, 2024 11:29

openshift-merge-robot removed the needs-rebase label

mpryc requested a review from sseago

August 19, 2024 20:33

sseago force-pushed the oadp-1.4 branch from 6888e15 to 9ac863a Compare

August 22, 2024 16:59

openshift-ci Bot commented Sep 10, 2024

@mpryc: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/lint	`74a4f2f`	link	true	`/test lint`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-merge-robot added the needs-rebase label

openshift-merge-robot commented Sep 10, 2024

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-bot commented Dec 10, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-ci Bot added the lifecycle/stale label

openshift-bot commented Jan 9, 2025

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-ci Bot added lifecycle/rotten and removed lifecycle/stale labels

openshift-bot commented Feb 9, 2025

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci Bot closed this

openshift-ci Bot commented Feb 9, 2025

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress lifecycle/rotten needs-rebase