Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(pkg/repository/maintenance): handle when there's no container status #8271

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelogs/unreleased/8271-mcluseau
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
fix(pkg/repository/maintenance): don't panic when there's no container statuses
15 changes: 14 additions & 1 deletion pkg/repository/maintenance.go
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,20 @@
}

// we only have one maintenance pod for the job
return podList.Items[0].Status.ContainerStatuses[0].State.Terminated.Message, nil
pod := podList.Items[0]

statuses := pod.Status.ContainerStatuses
mcluseau marked this conversation as resolved.
Show resolved Hide resolved
if len(statuses) == 0 {
return "", fmt.Errorf("no container statuses found for job %s", job.Name)

Check warning on line 124 in pkg/repository/maintenance.go

View check run for this annotation

Codecov / codecov/patch

pkg/repository/maintenance.go#L124

Added line #L124 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The termination-log is not always written, it is only written when maintenance fails.
So I suspect it is a non-error case when len(statuses) == 0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me, "no statuses" is not the expected state even if the container terminated successfully:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-state-terminated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I may have pointed the comment to wrong code. The most suspecting line is below:

	terminated := statuses[0].State.Terminated
	if terminated == nil {
		return "", fmt.Errorf("container for job %s is not terminated", job.Name)

As you can see from the maintenance code, if the maintenance succeeds, no termination message is written, so statuses[0].State.Terminated may indeed be nil. But this is a normal case.

Furthermore, when nothing is set to pod.Status.ContainerStatuses , will it be nil? If so, it is still a normal case.

Please double confirm this.

Copy link
Author

@mcluseau mcluseau Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I answered to the State.Terminated case too :) there's not just the termination log, there's also the exit code, time, etc.

About container statuses, they are expected to be filled (from waiting to running to terminated) but they may not be present when the pod was just created (I guess a kind of race condition leads the initial but mentioned here). So, my interpretation is that when it's empty, it's not a normal state and can't be after the container's termination: ie, the caller made the call too soon for some reason, very in line with semantics of asking before the pod was created by the job, which is a case that returns an error.

}

// we only have one maintenance container
terminated := statuses[0].State.Terminated
if terminated == nil {
return "", fmt.Errorf("container for job %s is not terminated", job.Name)

Check warning on line 130 in pkg/repository/maintenance.go

View check run for this annotation

Codecov / codecov/patch

pkg/repository/maintenance.go#L130

Added line #L130 was not covered by tests
}

return terminated.Message, nil
}

func GetLatestMaintenanceJob(cli client.Client, ns string) (*batchv1.Job, error) {
Expand Down
Loading