Skip to content

fix(backend/tty): add delayed rescan for connectors missing EDID on hotplug#3409

Open
coleleavitt wants to merge 2 commits intoniri-wm:mainfrom
coleleavitt:fix/monitor-rescan-edid-race
Open

fix(backend/tty): add delayed rescan for connectors missing EDID on hotplug#3409
coleleavitt wants to merge 2 commits intoniri-wm:mainfrom
coleleavitt:fix/monitor-rescan-edid-race

Conversation

@coleleavitt
Copy link

@coleleavitt coleleavitt commented Feb 7, 2026

Problem

USB-C docks with DP MST / alt-mode (e.g. Lenovo ThinkPad USB-C Dock Gen 2) can report connectors as Connected to the kernel DRM subsystem before EDID data has been read. When this happens:

  1. connector.modes() returns an empty list
  2. pick_mode() returns None
  3. connector_connected() logs "no mode" and skips activation

Smithay's ConnectorScanner treats (Connected, Connected) as a no-op — it does not re-emit events for already-connected connectors. This means the output gets stuck in a permanent "connected but never activated" dead state.

Whether activation succeeds depends entirely on timing: if a second UdevEvent::Changed fires after EDID completes, the connector recovers. This makes the bug intermittent — monitors sometimes come up and sometimes don't on the same hardware.

Root Cause

The EDID race is in the kernel/dock firmware timing, but niri had no retry path for connectors that were connected but could not be activated due to missing modes.

Fix

Two complementary mechanisms:

1. Handle DrmScanEvent::Changed (new in smithay PR #1923)

When a connector's mode list changes while it stays connected (e.g. EDID arrives after the initial probe returned empty/fallback modes), smithay now emits a DrmScanEvent::Changed event. We handle this by registering the crtc in known_crtcs (if no surface exists yet) so on_output_config_changed() can connect it. If a surface already exists, on_output_config_changed() will re-evaluate mode selection automatically.

Note: This requires bumping smithay to at least rev 9219bf8a9 (includes the Changed variant). The rev pin in Cargo.toml can be removed once niri updates smithay past this point.

2. Bounded rescan timer (defense-in-depth)

The Changed event only fires when scan_connectors() is called. It doesn't trigger rescans on its own — the kernel must fire a udev Changed event (or the timer must trigger a rescan). Since the kernel doesn't always fire a second udev event after EDID completes, we keep a bounded retry timer:

  • After device_changed() processes connectors, schedule_rescan_if_needed() checks for connected connectors that have no matching surface (not yet activated) and are not non-desktop connectors
  • If found, schedules a calloop::Timer (2 s delay) that re-invokes device_changed(), giving the kernel time to complete EDID reads
  • Retries are capped at MAX_RESCAN_RETRIES (3) to prevent infinite rescheduling
  • The timer self-clears when all connectors are successfully activated
  • Existing timers are cancelled before scheduling new ones (cancel-and-reschedule)
  • Timers are cleaned up on device removal

New fields on OutputDevice

  • rescan_timer_token: Option<RegistrationToken> — handle to cancel pending timers
  • rescan_retry_count: u8 — bounded counter, reset on successful activation

Testing

  • Hardware: ThinkPad P16 Gen 3, NVIDIA RTX PRO 4000 (nvidia-drm), Intel iGPU (i915), Lenovo USB-C Dock Gen 2, two LEN S27q-10 monitors (HDMI + DP MST)
  • Builds cleanly (cargo check, cargo clippy, cargo build --release)
  • Follows existing calloop timer patterns used elsewhere in tty.rs (e.g. VRR timers, redraw timers)

Alternatives Considered

  • Polling loop: Rejected in favor of bounded async timer to avoid blocking the event loop
  • Infinite retries: Rejected — capped at 3 to avoid pathological cases
  • Timer only (no Changed event): Works but less responsive. The Changed event gives immediate feedback when modes actually arrive, while the timer serves as a fallback.

@coleleavitt
Copy link
Author

Upstream Root Cause & Fix

After deeper analysis, the underlying issue is in smithay's ConnectorScanner in smithay-drm-extras. The (Connected, Connected) arm is a no-op, which means mode-list changes on already-connected connectors are silently ignored. When a USB-C dock connector reports as Connected before EDID is ready (empty mode list), and a later rescan finds modes populated, no Connected event is re-emitted.

I've filed an issue and PR upstream:

Why this PR is still valuable even with the smithay fix

The smithay fix ensures that when a rescan happens and modes have appeared, the compositor gets notified. However, this niri-level rescan timer is still needed because:

  1. No guaranteed second udev event: The kernel may only fire one Changed event before EDID is ready — without a timer-based rescan, there's nothing to trigger a second ConnectorScanner::scan()
  2. Defense in depth: The timer provides a bounded retry window (3 × 2s = 6s) regardless of udev event timing
  3. Complementary: The smithay fix makes each rescan more effective (mode changes are detected), while this PR ensures rescans actually happen

@coleleavitt coleleavitt force-pushed the fix/monitor-rescan-edid-race branch from 92ea363 to 67b12ec Compare February 7, 2026 07:41
@YaLTeR
Copy link
Member

YaLTeR commented Feb 15, 2026

I suppose this needs updating to use the new Changed event instead?

@coleleavitt
Copy link
Author

I suppose this needs updating to use the new Changed event instead?

yes I'll update it today if I get time; thanks

…otplug

USB-C docks with DP MST/alt-mode may report connectors as Connected
before EDID data is available, causing pick_mode() to return None and
connector_connected() to skip activation. Smithay's ConnectorScanner
does not re-emit events for already-connected connectors, leaving the
output in a permanent dead state.

Add a bounded retry mechanism: after device_changed() processes
connectors, schedule_rescan_if_needed() checks for connected connectors
that have no matching surface (not yet activated). If found, it schedules
a calloop timer (2 s delay) that re-runs device_changed(), giving the
kernel time to complete EDID reads. The retry is capped at 3 attempts
and self-clears when all connectors are activated or on device removal.
Update smithay to rev 9219bf8a9 which includes the new
DrmScanEvent::Changed variant (smithay PR niri-wm#1923). When a connector's
mode list changes while it stays connected (e.g. EDID arrives after the
initial probe returned empty modes), register the crtc in known_crtcs
so on_output_config_changed() can connect it.

This complements the existing rescan timer by providing immediate
detection when scan_connectors() runs and the connector's modes have
actually changed, rather than relying solely on retries.
@coleleavitt coleleavitt force-pushed the fix/monitor-rescan-edid-race branch from 1691d9f to 4a3e4ee Compare February 22, 2026 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants