[BUG] job_pipeline task does not set job's final status on exception jobs get stuck in RUNNING forever

## What happened

While reading through the `job_pipeline` task in `intel_owl/tasks.py` (lines 247-266), I noticed that when `job.execute()` throws an exception, the `except` block marks individual plugin reports as `FAILED` but it never actually updates the **Job object** itself.

Here's the relevant code:

```
@shared_task(base=FailureLoggedTask, name="job_pipeline", soft_time_limit=100)
def job_pipeline(job_id: int):
    from api_app.models import Job

    job = Job.objects.get(pk=job_id)
    try:
        job.execute()
    except Exception as e:
        logger.exception(e)
        for report in (
            list(job.analyzerreports.all())
            + list(job.connectorreports.all())
            + list(job.pivotreports.all())
            + list(job.visualizerreports.all())
        ):
            report.status = report.STATUSES.FAILED.value
            report.save()
```


The issue is inside Job.execute() (line 714 in models.py), the very first thing it does is set self.status = self.STATUSES.RUNNING and save it. So by the time an exception is raised (say during _get_pipeline() or _get_signatures() or if the broker is temporarily down), the job is already marked as RUNNING in the DB.

But the except block above doesn't set it back to FAILED. It also doesn't:

- Set job.finished_analysis_time
- Send a WebSocket notification via JobConsumer.serialize_and_send_job(job)
- Update the parent Investigation status

So the job just stays stuck in RUNNING forever. The job_set_final_status task (line 224) is supposed to handle this cleanup, but it's chained as the last step of the Celery pipeline if the pipeline never starts, that task never runs.

I noticed there's already a check_stuck_analysis periodic task (line 124) that catches these after ~25 minutes and marks them as failed, but that's more of a safety net. It also doesn't send any WebSocket notifications, so the user still sees a stuck spinner until they manually refresh.

Worth noting: the run_plugin task (line 268) does call JobConsumer.serialize_and_send_job(job) after exception handling, so there's already a pattern for this in the codebase job_pipeline just doesn't follow it.

## Environment

1. OS: Ubuntu (Docker)
2. IntelOwl version: develop (v6.6.0)

##  What did you expect to happen
If job.execute() fails, the except block should also clean up the job state properly. Something like:

1. Set job.status to FAILED
2. Append the error to job.errors
3. Set job.finished_analysis_time = now()
4. Call JobConsumer.serialize_and_send_job(job) so the frontend knows right away
5. Update the Investigation status if the job belongs to one

Basically the same stuff job_set_final_status would have done if the pipeline had completed normally.

## How to reproduce your issue

1. Cause an exception during job.execute() for example a DB error in _get_signatures(), a broken PythonConfig import, or even just the broker being momentarily unreachable when runner() is called
2. The job will be stuck as RUNNING in the database indefinitely
3. Frontend shows an infinite spinner with no error feedback
4. Eventually check_stuck_analysis cleans it up after ~25 min, but without notifying the frontend

You can also just read the code to confirm:

- intel_owl/tasks.py line 247-266 → the incomplete except block
- intel_owl/tasks.py line 224-232 → job_set_final_status that does the proper cleanup
- api_app/models.py line 714-715 → where execute() sets status to RUNNING before building the pipeline


## Error messages and logs
No error on the frontend that's the whole problem. The user just sees a job stuck in "Running" forever. Server-side, the exception gets logged via logger.exception(e) but from the user's perspective nothing happens.

DB state of a stuck job:

- job.status = running (never reaches a terminal state)
- job.finished_analysis_time = NULL
- Because finished_analysis_time is NULL, remove_old_jobs won't clean these up either they just accumulate



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] job_pipeline task does not set job's final status on exception jobs get stuck in RUNNING forever #3653

What happened

Environment

What did you expect to happen

How to reproduce your issue

Error messages and logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] job_pipeline task does not set job's final status on exception jobs get stuck in RUNNING forever #3653

Description

What happened

Environment

What did you expect to happen

How to reproduce your issue

Error messages and logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions