Skip to content

fix(consensus): WAL returns early without reply when already at height, causing silent consensus failure #11

@kirdatatjana

Description

@kirdatatjana

The WAL StartedHeight message handler returns early without sending a reply when the requested height equals the current height, causing the RecvErr error in the consensus engine.

Root Cause

In code/crates/engine/src/wal.rs:

Msg::StartedHeight(height, reply_to) => {
    if state.height == height {
        debug!(%height, "WAL already at height, ignoring");
        return Ok(());  // ❌ No reply sent to caller
    }
    state.height = height;

    self.started_height(state, height, reply_to).await?;
}

When the WAL is already at the requested height, it logs "WAL already at height, ignoring", and exits with Ok(()) without sending a reply through the channel.

  1. Consensus receives StartHeight or RestartHeight message.
  2. Calls wal_fetch(height) and executes:
ractor::call!(self.wal, WalMsg::StartedHeight, height)?
  1. WAL handler does not send a reply if state.height == height, which leads to a timeout and the RecvErr error.
  2. Consensus gets stuck waiting for a response (code ref).

Consequences

  • Consensus ractor::call! at consensus.rs never receives a reply.
  • The channel eventually closes or times out, causing Messaging(RecvErr) error.
  • Consensus remains stuck, unable to progress.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions