Skip to content

Commit

Permalink
job-monitor: handle invalid pod configuration
Browse files Browse the repository at this point in the history
  • Loading branch information
mdonadoni committed Nov 14, 2023
1 parent 6796cc1 commit 4a2fb9d
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Version 0.9.2 (UNRELEASED)
- Changes CVMFS support to allow users to automatically mount any available repository.
- Fixes container image building on the arm64 architecture.
- Fixes the creation of Kubernetes jobs by retrying in case of error and by correctly handling the error after reaching the retry limit.
- Fixes job monitoring in cases when job creation fails, for example when it is not possible to successfully mount volumes.

Version 0.9.1 (2023-09-27)
--------------------------
Expand Down
11 changes: 10 additions & 1 deletion reana_job_controller/job_monitor.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,10 +203,13 @@ def get_job_status(self, job_pod) -> Optional[str]:

elif job_pod.status.phase == "Pending":
for container in container_statuses:
reason = None
message = None
try:
reason = container.state.waiting.reason
message = container.state.waiting.message
except AttributeError:
reason = None
pass

if not reason:
continue
Expand All @@ -223,6 +226,12 @@ def get_job_status(self, job_pod) -> Optional[str]:
"failed due to invalid image name."
)
status = JobStatus.failed.name
elif "CreateContainerConfigError" in reason:
logging.info(

Check warning on line 230 in reana_job_controller/job_monitor.py

View check run for this annotation

Codecov / codecov/patch

reana_job_controller/job_monitor.py#L229-L230

Added lines #L229 - L230 were not covered by tests
f"Container {container.name} in Kubernetes job {backend_job_id} "
f"failed due to container configuration error: {message}"
)
status = JobStatus.failed.name

Check warning on line 234 in reana_job_controller/job_monitor.py

View check run for this annotation

Codecov / codecov/patch

reana_job_controller/job_monitor.py#L234

Added line #L234 was not covered by tests

return status

Expand Down

0 comments on commit 4a2fb9d

Please sign in to comment.