Summary
A freshly created session can still get drained for config-drift a short time after wake, even when there was no intentional config edit after the session was created.
This looks like a startup/reconcile race between the session bead’s stored startup hashes and the reconciler’s next drift check.
What I observed
A newly created session woke successfully, and then a few minutes later the supervisor logged:
Draining session '...': config-drift
From the user perspective, this looked wrong because the session had just been created and there had been no intentional config edit after creating it.
Likely cause
The most likely explanation is a race between:
- the initial session bead metadata written during sync
- the later
started_config_hash write after sp.Start succeeds
- the next reconciler drift check using a newer desired-state snapshot
Relevant code paths:
-
Session bead creation stores config_hash / live_hash during sync:
cmd/gc/session_beads.go
-
Drift detection prefers started_config_hash when present, otherwise falls back to config_hash:
cmd/gc/session_reconciler.go
-
started_config_hash is only written after the wake/start path completes:
cmd/gc/session_lifecycle_parallel.go
This seems to create a window where a newly started session can still be compared against a different/current template hash before its startup metadata is fully settled.
Why this matters
This makes newly created sessions feel unstable and can cause immediate interruption of active work, especially when config changed shortly before the wake but not after the user created the session.
Expected behavior
A just-started session should not be considered drifted until its startup hash metadata is fully committed and stable.
Possible fixes:
- suppress config-drift checks until
started_config_hash is present
- make the startup hash write happen before the session becomes eligible for the next drift check
- re-read / revalidate bead metadata after start before applying drift logic
- otherwise close the race between sync-time
config_hash and post-start started_config_hash
Extra note
This does not appear to require a same-tick config edit after session creation. A config change shortly before wake seems sufficient to trigger the race.
Summary
A freshly created session can still get drained for
config-drifta short time after wake, even when there was no intentional config edit after the session was created.This looks like a startup/reconcile race between the session bead’s stored startup hashes and the reconciler’s next drift check.
What I observed
A newly created session woke successfully, and then a few minutes later the supervisor logged:
Draining session '...': config-driftFrom the user perspective, this looked wrong because the session had just been created and there had been no intentional config edit after creating it.
Likely cause
The most likely explanation is a race between:
started_config_hashwrite aftersp.StartsucceedsRelevant code paths:
Session bead creation stores
config_hash/live_hashduring sync:cmd/gc/session_beads.goDrift detection prefers
started_config_hashwhen present, otherwise falls back toconfig_hash:cmd/gc/session_reconciler.gostarted_config_hashis only written after the wake/start path completes:cmd/gc/session_lifecycle_parallel.goThis seems to create a window where a newly started session can still be compared against a different/current template hash before its startup metadata is fully settled.
Why this matters
This makes newly created sessions feel unstable and can cause immediate interruption of active work, especially when config changed shortly before the wake but not after the user created the session.
Expected behavior
A just-started session should not be considered drifted until its startup hash metadata is fully committed and stable.
Possible fixes:
started_config_hashis presentconfig_hashand post-startstarted_config_hashExtra note
This does not appear to require a same-tick config edit after session creation. A config change shortly before wake seems sufficient to trigger the race.