Skip to content

Newly created sessions can be drained for config-drift shortly after wake #127

@corymhall

Description

@corymhall

Summary

A freshly created session can still get drained for config-drift a short time after wake, even when there was no intentional config edit after the session was created.

This looks like a startup/reconcile race between the session bead’s stored startup hashes and the reconciler’s next drift check.

What I observed

A newly created session woke successfully, and then a few minutes later the supervisor logged:

  • Draining session '...': config-drift

From the user perspective, this looked wrong because the session had just been created and there had been no intentional config edit after creating it.

Likely cause

The most likely explanation is a race between:

  1. the initial session bead metadata written during sync
  2. the later started_config_hash write after sp.Start succeeds
  3. the next reconciler drift check using a newer desired-state snapshot

Relevant code paths:

  • Session bead creation stores config_hash / live_hash during sync:
    cmd/gc/session_beads.go

  • Drift detection prefers started_config_hash when present, otherwise falls back to config_hash:
    cmd/gc/session_reconciler.go

  • started_config_hash is only written after the wake/start path completes:
    cmd/gc/session_lifecycle_parallel.go

This seems to create a window where a newly started session can still be compared against a different/current template hash before its startup metadata is fully settled.

Why this matters

This makes newly created sessions feel unstable and can cause immediate interruption of active work, especially when config changed shortly before the wake but not after the user created the session.

Expected behavior

A just-started session should not be considered drifted until its startup hash metadata is fully committed and stable.

Possible fixes:

  • suppress config-drift checks until started_config_hash is present
  • make the startup hash write happen before the session becomes eligible for the next drift check
  • re-read / revalidate bead metadata after start before applying drift logic
  • otherwise close the race between sync-time config_hash and post-start started_config_hash

Extra note

This does not appear to require a same-tick config edit after session creation. A config change shortly before wake seems sufficient to trigger the race.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugBroken behaviorpriority/p2Medium — real problem, workaround exists

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions