Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The node was low on resource: ephemeral-storage #233

Open
cachaldora opened this issue Jan 19, 2024 · 3 comments
Open

The node was low on resource: ephemeral-storage #233

cachaldora opened this issue Jan 19, 2024 · 3 comments
Labels
bug Something isn't working needs:triage

Comments

@cachaldora
Copy link

cachaldora commented Jan 19, 2024

What happened?

Earlier this week I’ve been migrating crossplane resources from an old cluster to a new one. During this process we needed to reconcile about 400 terraform workspaces (half of them with remote state).

After adjusting TF provider pod resources (requests.memory: 2G, limits.memory: 2G, requests.cpu: 1000m, limits.cpu: 5000m) it was being evicted every 20 minutes with the following error:

Warning  Evicted              117s  kubelet            The node was low on resource: ephemeral-storage. Threshold quantity: 7859887835, available: 7344104Ki. Container package-runtime was using 51370076Ki, request is 0, has larger consumption of ephemeral-storage.  Normal   Killing              117s  kubelet            Stopping container package-runtime  Warning  ExceededGracePeriod  107s  kubelet            Container runtime did not kill the pod within specified grace period.

The workaround was adjusting pod resources.request.ephemeral-storage to 60Gi and this increased time to eviction.

TF provider (v0.11.0) was configured to disable plugin cache because it had --max-reconcile-rate=10.

What environment did it happen in?

  • Crossplane Version: 1.14.3-up.1
  • Provider Version: 0.11.0
  • Kubernetes Version: 1.26.10
  • Kubernetes Distribution: AKS
@cachaldora cachaldora added bug Something isn't working needs:triage labels Jan 19, 2024
@ytsarev
Copy link
Member

ytsarev commented Jan 19, 2024

@cachaldora I think you can try to enable plugin cache again after we got #215 merged. It should help with overall performance situation

@ytsarev
Copy link
Member

ytsarev commented Jan 19, 2024

@cachaldora the above mentioned change was released in https://github.com/upbound/provider-terraform/releases/tag/v0.12.0 . I recommend to upgrade to the latest https://github.com/upbound/provider-terraform/releases/tag/v0.13.0

@cachaldora
Copy link
Author

cachaldora commented Jan 19, 2024

About the peformance, I've upgraded to 0.13.0 and open a related issue: #234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs:triage
Projects
None yet
Development

No branches or pull requests

2 participants