Skip to content

fix(backend/tty): EDID race recovery with DrmScanEvent::Changed + rescan timer#3

Open
coleleavitt wants to merge 2 commits intomainfrom
fix/monitor-rescan-edid-race
Open

fix(backend/tty): EDID race recovery with DrmScanEvent::Changed + rescan timer#3
coleleavitt wants to merge 2 commits intomainfrom
fix/monitor-rescan-edid-race

Conversation

@coleleavitt
Copy link
Owner

@coleleavitt coleleavitt commented Feb 22, 2026

Summary

Upstream PR: niri-wm#3409


Summary by cubic

Fixes outputs that stay inactive after hotplug when EDID arrives late. Handles DrmScanEvent::Changed and adds a bounded rescan timer so connected connectors activate as soon as modes appear.

  • Bug Fixes

    • Handle DrmScanEvent::Changed to register CRTCs when a connector’s mode list updates, letting on_output_config_changed activate it.
    • Add a rescan timer (2s delay, up to 3 attempts) for connected connectors without modes/surfaces; retries re-run device_changed, reset on activation or when all are active, and cancel on device removal.
  • Dependencies

    • Pin smithay and smithay-drm-extras to rev 9219bf8a9 to use DrmScanEvent::Changed.

Written for commit 4a3e4ee. Summary will update on new commits.

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of connected displays that become active after initial connection, including retrying detection and re-evaluating outputs after configuration changes.
    • Better detection of display mode/list changes while connected to ensure updated display states are recognized.
  • Chores

    • Pinned upstream dependency revisions to a specific commit.

…otplug

USB-C docks with DP MST/alt-mode may report connectors as Connected
before EDID data is available, causing pick_mode() to return None and
connector_connected() to skip activation. Smithay's ConnectorScanner
does not re-emit events for already-connected connectors, leaving the
output in a permanent dead state.

Add a bounded retry mechanism: after device_changed() processes
connectors, schedule_rescan_if_needed() checks for connected connectors
that have no matching surface (not yet activated). If found, it schedules
a calloop timer (2 s delay) that re-runs device_changed(), giving the
kernel time to complete EDID reads. The retry is capped at 3 attempts
and self-clears when all connectors are activated or on device removal.
@coderabbitai
Copy link

coderabbitai bot commented Feb 22, 2026

📝 Walkthrough

Walkthrough

Pinned workspace Git dependencies for smithay and smithay-drm-extras to revision 9219bf8a9. Added a delayed rescan mechanism in the TTY backend to retry activation of connected connectors that lack active surfaces, with configurable retry limits and delay.

Changes

Cohort / File(s) Summary
Dependency Management
Cargo.toml
Pinned smithay and smithay-drm-extras Git dependencies to revision 9219bf8a9 in workspace dependencies.
Connector Rescan Logic
src/backend/tty.rs
Added delayed rescan mechanism: MAX_RESCAN_RETRIES and RESCAN_DELAY. Extended OutputDevice with rescan_timer_token and rescan_retry_count. Schedule/cancel rescan timers across device_added/device_changed/device_removed and on_output_config_changed flows; dedupe changed connectors and reset retry state when new outputs appear.

Sequence Diagram(s)

sequenceDiagram
    participant Device as Device Manager
    participant Connector as Connector State
    participant Timer as Timer System
    participant Handler as device_changed()

    Device->>Connector: create/update output device
    Connector->>Connector: detect connected but without active surfaces
    alt Rescan Needed
        Connector->>Timer: schedule timer (RESCAN_DELAY) / increment retry_count
    end
    Timer-->>Handler: timer fired
    Handler->>Connector: re-evaluate connector state (device_changed)
    Connector->>Connector: attempt activation / register changed connector
    alt Still unactivated && retries remain
        Connector->>Timer: reschedule timer (increment retry_count)
    else Max retries reached or activated
        Connector->>Timer: cancel timer / clear rescan state
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Poem

🐇 I sniffed the display, quiet and keen,

Timers set to wake what’s unseen,
Retry a hop, then one more try,
Surfaces wake beneath my eye,
Hooray — the screen blinks back at me!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding EDID race recovery via DrmScanEvent::Changed handling and a rescan timer mechanism in the TTY backend.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/monitor-rescan-edid-race

Comment @coderabbitai help to get the list of available commands and usage tips.

@coleleavitt coleleavitt force-pushed the fix/monitor-rescan-edid-race branch from 679a92b to 1691d9f Compare February 22, 2026 17:50
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/backend/tty.rs">

<violation number="1" location="src/backend/tty.rs:1078">
P1: Missing duplicate make/model/serial check for `Changed` connectors. The `added` handler has a dedup guard because "our Layout does not support such duplicates and will panic," but this new `changed` registration path skips it entirely. If a `Changed` connector has the same make/model/serial as an existing one, this could trigger a panic in Layout. Apply the same dedup logic used in the `added` loop.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Update smithay to rev 9219bf8a9 which includes the new
DrmScanEvent::Changed variant (smithay PR niri-wm#1923). When a connector's
mode list changes while it stays connected (e.g. EDID arrives after the
initial probe returned empty modes), register the crtc in known_crtcs
so on_output_config_changed() can connect it.

This complements the existing rescan timer by providing immediate
detection when scan_connectors() runs and the connector's modes have
actually changed, rather than relying solely on retries.
@coleleavitt coleleavitt force-pushed the fix/monitor-rescan-edid-race branch from 1691d9f to 4a3e4ee Compare February 22, 2026 17:55
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/backend/tty.rs`:
- Around line 1126-1156: The MAX_RESCAN_RETRIES guard in
schedule_rescan_if_needed currently runs before checking has_unactivated, which
prevents resetting device.rescan_retry_count when connectors become deactivated;
change the control flow in schedule_rescan_if_needed (the function in tty.rs) so
you evaluate has_unactivated first, and if false clear device.rescan_retry_count
and remove any device.rescan_timer_token via niri.event_loop.remove(token), then
only apply the MAX_RESCAN_RETRIES check when has_unactivated is true (or move
the max-retry block after the has_unactivated branch) to avoid permanently
disabling future rescans.

Comment on lines +1126 to +1156
self.schedule_rescan_if_needed(node, niri);
}

fn schedule_rescan_if_needed(&mut self, node: DrmNode, niri: &mut Niri) {
let Some(device) = self.devices.get_mut(&node) else {
return;
};

if device.rescan_retry_count >= MAX_RESCAN_RETRIES {
debug!(
"reached max rescan retries ({MAX_RESCAN_RETRIES}) for {node}, \
not scheduling another"
);
return;
}

let has_unactivated = device.drm_scanner.crtcs().any(|(conn, crtc)| {
conn.state() == connector::State::Connected
&& !device.surfaces.contains_key(&crtc)
&& !device
.non_desktop_connectors
.contains(&(conn.handle(), crtc))
});

if !has_unactivated {
device.rescan_retry_count = 0;
if let Some(token) = device.rescan_timer_token.take() {
niri.event_loop.remove(token);
}
return;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reset retry state even after hitting MAX to avoid permanently disabling rescans.

Because the max-retry guard runs before the has_unactivated check, the retry counter never resets when all unactivated connectors disappear. That can block future hotplugs on the same device from ever scheduling rescans.

🛠️ Suggested fix (reorder checks to allow reset)
-        if device.rescan_retry_count >= MAX_RESCAN_RETRIES {
-            debug!(
-                "reached max rescan retries ({MAX_RESCAN_RETRIES}) for {node}, \
-                 not scheduling another"
-            );
-            return;
-        }
-
-        let has_unactivated = device.drm_scanner.crtcs().any(|(conn, crtc)| {
+        let has_unactivated = device.drm_scanner.crtcs().any(|(conn, crtc)| {
             conn.state() == connector::State::Connected
                 && !device.surfaces.contains_key(&crtc)
                 && !device
                     .non_desktop_connectors
                     .contains(&(conn.handle(), crtc))
         });
 
         if !has_unactivated {
             device.rescan_retry_count = 0;
             if let Some(token) = device.rescan_timer_token.take() {
                 niri.event_loop.remove(token);
             }
             return;
         }
+
+        if device.rescan_retry_count >= MAX_RESCAN_RETRIES {
+            debug!(
+                "reached max rescan retries ({MAX_RESCAN_RETRIES}) for {node}, \
+                 not scheduling another"
+            );
+            return;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/tty.rs` around lines 1126 - 1156, The MAX_RESCAN_RETRIES guard in
schedule_rescan_if_needed currently runs before checking has_unactivated, which
prevents resetting device.rescan_retry_count when connectors become deactivated;
change the control flow in schedule_rescan_if_needed (the function in tty.rs) so
you evaluate has_unactivated first, and if false clear device.rescan_retry_count
and remove any device.rescan_timer_token via niri.event_loop.remove(token), then
only apply the MAX_RESCAN_RETRIES check when has_unactivated is true (or move
the max-retry block after the has_unactivated branch) to avoid permanently
disabling future rescans.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
Cargo.toml (1)

34-34: Consider using the full 40-character commit hash for reproducibility.

The partial hash 9219bf8a9 resolves to 9219bf8a9b79f528a834c1083630e590a9b1a04e and currently works without ambiguity. However, using the full hash guards against potential collisions as the repository grows and makes the lockfile more explicit.

Also applies to: 41-41

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Cargo.toml` at line 34, Replace the shortened git revision string used in the
dependency pinning (the rev = "9219bf8a9" entries) with the full 40-character
commit hash to ensure reproducibility; update both occurrences of the rev value
(look for rev = "9219bf8a9" in Cargo.toml) to rev =
"9219bf8a9b79f528a834c1083630e590a9b1a04e".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/backend/tty.rs`:
- Around line 1156-1178: Move the unactivated-connector check before the
MAX_RESCAN_RETRIES guard: first compute has_unactivated via
device.drm_scanner.crtcs() (the existing closure) and if !has_unactivated reset
device.rescan_retry_count = 0 and clear/remove any device.rescan_timer_token
with niri.event_loop.remove(token) (as currently done), then return; only after
that check enforce the if device.rescan_retry_count >= MAX_RESCAN_RETRIES early
return so reaching MAX no longer blocks the reset when connectors become
activated by other code paths.

---

Nitpick comments:
In `@Cargo.toml`:
- Line 34: Replace the shortened git revision string used in the dependency
pinning (the rev = "9219bf8a9" entries) with the full 40-character commit hash
to ensure reproducibility; update both occurrences of the rev value (look for
rev = "9219bf8a9" in Cargo.toml) to rev =
"9219bf8a9b79f528a834c1083630e590a9b1a04e".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant