bug(ansible): provision-fleet-roles.yml has no pre-flight git pull — stale code_source causes old Ansible roles to run

## Summary

`provision-fleet-roles.yml` runs Ansible roles directly from `code_source` at `/opt/autobot/code_source` without first pulling the latest code from GitHub. If `code_source` is stale (e.g., after a PR merge that hasn't been picked up yet), the old role code executes and can cause hard-to-diagnose failures.

## Observed in Production

Provisioning run on 2026-04-06 failed at Phase 5a (AI Stack) on `00-SLM-Manager`:

```
TASK [ai-stack : Install AI Python packages from requirements-ai.txt]
fatal: [00-SLM-Manager]: FAILED!
ERROR: Could not find a version that satisfies the requirement numpy<3.0.0,>=2.4.3
ERROR: No matching distribution found for numpy<3.0.0,>=2.4.3
```

**Root cause chain:**
1. PR #3536 (merged) added stale-venv detection to the ai-stack role — detects Python 3.10 venv, deletes it, recreates with Python 3.12
2. `code_source` on the SLM manager was **not updated** after #3536 merged
3. The stale ai-stack role (no stale-venv detection) ran against a host that had a pre-existing Python 3.10 venv
4. numpy>=2.4.3 requires Python >=3.11 → install fails

**Evidence that old code was deployed** — task name in Ansible output:
```
TASK [ai-stack : Create Python virtual environment]        ← old (no #3534 suffix)
```
Current code in repo:
```yaml
- name: Create Python virtual environment (#3534)          ← new
```
The stale-venv detection tasks and pip cache dir task were entirely absent from the verbose output.

## Existing Pattern

`update-all-nodes.yml` already does this correctly (lines 61–68):

```yaml
- name: "[PRE-FLIGHT] Sync latest code from GitHub (if reachable)"
  git:
    repo: "{{ github_repo }}"
    dest: "{{ git_repo_root }}"
    version: "{{ deploy_ref }}"
    update: yes
  ignore_errors: yes
```

## Proposed Fix

Add the same pre-flight git pull play to `provision-fleet-roles.yml` (or to `setup_wizard.py` before invoking it), so the Ansible roles always reflect the latest merged code before a provision run.

The `ignore_errors: yes` pattern from `update-all-nodes.yml` is appropriate — if the SLM has no internet access, fall back to whatever is in `code_source` with a warning (same behavior as today, but explicit).

## Impact

- Any provision run after a PR merge that touches Ansible roles may use stale role code
- Failures are non-obvious: the old task names in the verbose output are the only clue
- Affects all roles, not just ai-stack

## Labels

- bug, ansible, infra, medium

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug(ansible): provision-fleet-roles.yml has no pre-flight git pull — stale code_source causes old Ansible roles to run #3561

Summary

Observed in Production

Existing Pattern

Proposed Fix

Impact

Labels

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

bug(ansible): provision-fleet-roles.yml has no pre-flight git pull — stale code_source causes old Ansible roles to run #3561

Description

Summary

Observed in Production

Existing Pattern

Proposed Fix

Impact

Labels

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions