consider reducing TerminationGracePeriodSeconds for spin-apps deployment/pod spec #118

rajatjindal · 2024-03-02T18:22:38Z

I was trying to understand why the scaling down of spin apps (after manually editing the number of replicas) is taking so long. It is likely due to the default 30s value of TerminationGracePeriodSeconds when creating the spin-app pods.

I reduced TerminationGracePeriodSeconds to 2s on my local setup (via a custom build of spin-operator), after which the scale-down is quite fast now. I believe that this change will also help with HPAorKeda` based scaledown.

We should consider adding a decent default and should possibly make it configurable on the SpinApp CRD.

The text was updated successfully, but these errors were encountered:

endocrimes · 2024-03-03T02:08:03Z

It needs to be >= the length of the longest request you expect to receive to allow for inflight events to safely drain.

If it's not shutting down after draining inflight reqs and instead waits for a SIGKILL that sounds like a bug in spin or the shim.

rajatjindal · 2024-03-03T03:03:43Z

you are right. I looked into containerd logs:

time="2024-03-03T02:54:23.254922505Z" level=info msg="StopContainer for \"a3db1d9768c405058596d2c6298d7b53d89cc81c79eb54d4801cbee0cdc434ec\" with timeout 30 (s)"
time="2024-03-03T02:54:23.255270546Z" level=info msg="Stop container \"a3db1d9768c405058596d2c6298d7b53d89cc81c79eb54d4801cbee0cdc434ec\" with signal terminated"
time="2024-03-03T02:54:23.255412963Z" level=info msg="sending signal 15 to instance: a3db1d9768c405058596d2c6298d7b53d89cc81c79eb54d4801cbee0cdc434ec"

### 30 Seconds later

time="2024-03-03T02:54:53.267455088Z" level=info msg="Kill container \"a3db1d9768c405058596d2c6298d7b53d89cc81c79eb54d4801cbee0cdc434ec\""
time="2024-03-03T02:54:53.267601463Z" level=info msg="sending signal 9 to instance: a3db1d9768c405058596d2c6298d7b53d89cc81c79eb54d4801cbee0cdc434ec"
time="2024-03-03T02:54:53.280759796Z" level=info msg="deleting instance: a3db1d9768c405058596d2c6298d7b53d89cc81c79eb54d4801cbee0cdc434ec"
time="2024-03-03T02:54:53.281252463Z" level=info msg="shim disconnected" id=a3db1d9768c405058596d2c6298d7b53d89cc81c79eb54d4801cbee0cdc434ec namespace=k8s.io
time="2024-03-03T02:54:53.281263755Z" level=warning msg="cleaning up after shim disconnected" id=a3db1d9768c405058596d2c6298d7b53d89cc81c79eb54d4801cbee0cdc434ec namespace=k8s.io
time="2024-03-03T02:54:53.285352921Z" level=info msg="StopContainer for \"a3db1d9768c405058596d2c6298d7b53d89cc81c79eb54d4801cbee0cdc434ec\" returns successfully"
time="2024-03-03T02:54:53.285577796Z" level=info msg="Container to stop \"a3db1d9768c405058596d2c6298d7b53d89cc81c79eb54d4801cbee0cdc434ec\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
time="2024-03-03T02:54:53.866617713Z" level=info msg="RemoveContainer for \"a3db1d9768c405058596d2c6298d7b53d89cc81c79eb54d4801cbee0cdc434ec\""
time="2024-03-03T02:54:53.868465005Z" level=info msg="RemoveContainer for \"a3db1d9768c405058596d2c6298d7b53d89cc81c79eb54d4801cbee0cdc434ec\" returns successfully"
time="2024-03-03T02:54:53.868765338Z" level=error msg="ContainerStatus for \"a3db1d9768c405058596d2c6298d7b53d89cc81c79eb54d4801cbee0cdc434ec\" failed" error="rpc error: code = NotFound desc = an error occurred wh

rajatjindal · 2024-03-03T03:20:26Z

oh I think this is same as: deislabs/containerd-wasm-shims#207

rajatjindal · 2024-03-08T09:34:11Z

this turns out to be due to os signal handling in containerd-shim.

rajatjindal closed this as completed Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider reducing TerminationGracePeriodSeconds for spin-apps deployment/pod spec #118

consider reducing TerminationGracePeriodSeconds for spin-apps deployment/pod spec #118

rajatjindal commented Mar 2, 2024

endocrimes commented Mar 3, 2024

rajatjindal commented Mar 3, 2024 •

edited

Loading

rajatjindal commented Mar 3, 2024

rajatjindal commented Mar 8, 2024

consider reducing TerminationGracePeriodSeconds for spin-apps deployment/pod spec #118

consider reducing TerminationGracePeriodSeconds for spin-apps deployment/pod spec #118

Comments

rajatjindal commented Mar 2, 2024

endocrimes commented Mar 3, 2024

rajatjindal commented Mar 3, 2024 • edited Loading

rajatjindal commented Mar 3, 2024

rajatjindal commented Mar 8, 2024

rajatjindal commented Mar 3, 2024 •

edited

Loading