-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement space deletion job endpoint #2647
Conversation
c308995
to
2340c78
Compare
I'm concerned about the new dynamic client here. If you're concerned about caching, I believe this is just another non-caching client, as the privileged and user clients here are already not using a cache. |
We are pretty sure that both clients are actually uncached. We tried to put Maybe the condition you caught is not due to caching but just general etcd eventual consistency? In that case maybe we should do retries? I would avoid introducing a new client unless we can prove that it makes a difference. |
Thank you both for taking a look at this. 🙏
That was our understanding as well.
The thing that led us to believe that the client used by the API was using a cache was that @matt-royal consistently encountered an issue when testing on a KinD cluster on his M1 Mac where the first GET call to $ cf delete-org o -v; kubectl get -n cf cforg/cf-org-f00abc7c-e09b-4ee6-a9e5-891859f79f76 -o yaml
...
Deleting org o as cf-admin...
REQUEST: [2023-06-30T13:58:24-07:00]
GET /v3/organizations?names=o HTTP/1.1
Host: localhost
Accept: application/json
Authorization: [PRIVATE DATA HIDDEN]
Content-Type: application/json
User-Agent: cf/8.7.1+9c81242.2023-06-15 (go1.20.5; amd64 darwin)
[application/json Content Hidden]
RESPONSE: [2023-06-30T13:58:24-07:00]
HTTP/1.1 200 OK
Content-Type: application/json
Date: Fri, 30 Jun 2023 20:58:24 GMT
Server: envoy
Vary: Accept-Encoding
X-Correlation-Id: 81578357-659d-44ff-a2db-aedf9a12948f
X-Envoy-Upstream-Service-Time: 9
{
"pagination": {
"first": {
"href": "https://localhost/v3/organizations?names=o"
},
"last": {
"href": "https://localhost/v3/organizations?names=o"
},
"next": null,
"previous": null,
"total_pages": 1,
"total_results": 1
},
"resources": [
{
"created_at": "2023-06-30T20:57:36Z",
"guid": "cf-org-f00abc7c-e09b-4ee6-a9e5-891859f79f76",
"links": {
"self": {
"href": "https://localhost/v3/organizations/cf-org-f00abc7c-e09b-4ee6-a9e5-891859f79f76"
}
},
"metadata": {
"annotations": {
"korifi.cloudfoundry.org/creation-version": "v0.7.2-271-ga6fff152"
},
"labels": {}
},
"name": "o",
"relationships": {},
"suspended": false,
"updated_at": "2023-06-30T20:57:36Z"
}
]
}
REQUEST: [2023-06-30T13:58:24-07:00]
DELETE /v3/organizations/cf-org-f00abc7c-e09b-4ee6-a9e5-891859f79f76 HTTP/1.1
Host: localhost
Accept: application/json
Authorization: [PRIVATE DATA HIDDEN]
Content-Type: application/json
User-Agent: cf/8.7.1+9c81242.2023-06-15 (go1.20.5; amd64 darwin)
[application/json Content Hidden]
RESPONSE: [2023-06-30T13:58:24-07:00]
HTTP/1.1 202 Accepted
Content-Length: 0
Date: Fri, 30 Jun 2023 20:58:24 GMT
Location: https://localhost/v3/jobs/org.delete~cf-org-f00abc7c-e09b-4ee6-a9e5-891859f79f76
Server: envoy
X-Correlation-Id: 0b708888-d59e-402e-a072-07511a855f52
X-Envoy-Upstream-Service-Time: 11
REQUEST: [2023-06-30T13:58:24-07:00]
GET /v3/jobs/org.delete~cf-org-f00abc7c-e09b-4ee6-a9e5-891859f79f76 HTTP/1.1
Host: localhost
Accept: application/json
Authorization: [PRIVATE DATA HIDDEN]
Content-Type: application/json
User-Agent: cf/8.7.1+9c81242.2023-06-15 (go1.20.5; amd64 darwin)
[application/json Content Hidden]
RESPONSE: [2023-06-30T13:58:24-07:00]
HTTP/1.1 404 Not Found
Content-Type: application/json
Date: Fri, 30 Jun 2023 20:58:24 GMT
Server: envoy
Vary: Accept-Encoding
X-Correlation-Id: 6ae99bec-1e6a-47c1-975f-8c86eb0cd0b9
X-Envoy-Upstream-Service-Time: 9
{
"errors": [
{
"code": 10010,
"detail": "Job not found. Ensure it exists and you have access to it.",
"title": "CF-ResourceNotFound"
}
]
}
Job not found. Ensure it exists and you have access to it.
FAILED
apiVersion: korifi.cloudfoundry.org/v1alpha1
kind: CFOrg
metadata:
annotations:
korifi.cloudfoundry.org/creation-version: v0.7.2-271-ga6fff152
creationTimestamp: "2023-06-30T20:57:36Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2023-06-30T20:58:24Z"
finalizers:
- cfOrg.korifi.cloudfoundry.org
generation: 2
name: cf-org-f00abc7c-e09b-4ee6-a9e5-891859f79f76
namespace: cf
resourceVersion: "2297"
uid: ff51f747-dc2c-4760-bbfb-902e700c4c88
spec:
displayName: o
status:
conditions:
- lastTransitionTime: "2023-06-30T20:57:36Z"
message: ""
observedGeneration: 1
reason: Ready
status: "True"
type: Ready
guid: cf-org-f00abc7c-e09b-4ee6-a9e5-891859f79f76
observedGeneration: 2 I tried again with a new org with a space and an app pushed to check the API logs. When I deleted the org, I received the same output as above, and saw this in the logs:
This could be an etcd eventual consistency problem, but it seemed to be happening too consistently for that. The other issue I ran into while testing the org deletion job was that the API would return a "COMPLETED" status in the response to (normally) the third call, which would make the CLI stop polling and print
I agree that adding the new client feels unnecessary. The strange thing is that the new dynamic client seems to eliminate both of the behaviours I described above when testing on the AKS cluster. I just tried an alternative approach, which is to update the org's |
I wouldn't worry about resources still visible in k9s: there's no cache issue that can make things invisible even if they still exist, only make things visible even if they don't exist anymore, so it's probably due to k9s's cache. If we know for a fact that both clients are uncached, then we shouldn't be calling only one of the two "uncached". We should understand why the dynamic client succeeds where the regular client fails, but it might just be because it's slower. If this is the case, then I'd just add some retries and get rid of the dynamic client altogether. |
Julian and I figured out what was causing the /v3/jobs endpoint to return "COMPLETED" early; it corresponded to the deletion of the role binding in the namespace. We ended up creating a new We also decided to remove the new user client and go with the retry loop to wait for the deletion timestamp to be set in the org/space. |
[#2605] Co-authored-by: Dave Walter <[email protected]>
- Refactor org deletion helper to handle all types. [#2604] Co-authored-by: Dave Walter <[email protected]>
I just confirmed that both |
- Add a retry loop to wait for the deletion timestamp to propagate. - Added a GetOrgUnfiltered function to the org repository to avoid falsely returning a not found error when the role binding is deleted from an org during deletion. - Consolidate CFSpaceRepository and SpaceRepository interfaces. [#2604] [#2605] Co-authored-by: Dave Walter <[email protected]> Co-authored-by: Julian Hjortshoj <[email protected]>
Is there a related GitHub Issue?
#2604
What is this change about?
Return the status of space deletion based on the underlying CFSpace resource state.
Does this PR introduce a breaking change?
No.
Acceptance Steps
See issue.
Tag your pair, your PM, and/or team
Paired with @acosta11