Skip to content

Towards Support for Asynchrony in TestRIG

Nathaniel Wesley Filardo edited this page Nov 27, 2023 · 1 revision

Some notes after a discussion with @kliuMsft, @PeterRugg, @jonwoodruff, @davidchisnall, @rmn30, @gameboo, and @nwf-msr.

Motivation

We'd like to extend TestRIG (and DII and RVFI, as necessary) to support verification beyond the Sail-esque view of the world where every instruction in the random trace retires in order and without external or microarchitectural perturbation. In particular, we'd like to use TestRIG's instruction streams to also get coverage of the microarchitectural handling of external interrupts, branch predictors, ifetch delays, and so on.

Sketched Approach

We are pondering using TestRIG and its DUTs with different information flow. The current setup, which is focused on parallel comparison of retired instructions between the various DUTs, has DII streams from TestRIG to each DUT and RVFI streams from each DUT back to TestRIG for comparison. We imagine constructing "post-hoc" DII streams based on a DUT's RVFI output (and its DII input), allowing DUTs to delay external interrupts or remove killed instructions, for example. These traces could then be fed into more synchronous models, like Sail, for verification.

Ifetch Timing

@PeterRugg noted that there is already a time field in DII that CTSRD implementations are ignoring; the DII frontends could use this to insert bubbles into the pipeline. @kliuMsft noted that the CHERIoT-Ibex implementation of DII also ignores the DII time field but has a memory model that can insert delays (and takes its cues from the existing ifetch stage's timing?).

We expect Ifetch timing concerns to mostly "boil off" once run through the DUT: there should be no architecturally visible reflection of these delays, except in cycle-based counters, which are already generally excluded from TestRIG's consideration.

Speculation

Because TestRIG and DII currently expect all issued instructions to retire, the DII frontends have the ability to rewind and replay parts of the trace in response to the pipeline killing instructions (e.g., in response to branch misprediction). We think it might be interesting to offer a mode where that didn't happen, that is, where killed instructions were consumed from the trace and not re-issued. This could let TestRIG probe at the pipeline's flush logic: any accidentally latched architectural state would be visible to later instructions, and might be exposed by the non-idempotency of the instruction stream.

We expect the post-hoc DII stream, here, to contain only retired instructions. Divergences between the DUT's RVFI stream and that of the model consuming the post-hoc DII stream could point to speculation bugs (or side channels!).

Interrupts

Interrupts are tricky. The request pin having been asserted doesn't mean a whole lot to microarchitecture, which will get around to servicing the interrupt at its own pace, after some number of cycles and their effects on pipeline state. In particular, the number of instructions that retire or are killed when an interrupt fires is not architecturally defined.

We imagine adding two pseduo-instructions to DII:

  1. An interrupt request, which does what it says on the tin, latching an interrupt request.
  2. An interrupt barrier, which delays ifetch until the frontend would be fetching as part of an interrupt.

The purpose of the barrier is to allow TestRIG templates to generate streams with "interrupt service routine"-like instructions (access mscratch, read cause, &c) and be sure that they run after the interrupt has actually, microarchitecturally, fired, even if there were insufficient instructions to fill the pipeline between request and "ISR" entry.

Here, we expect the post-hoc DII stream to be a modulated view of the original, with any interrupt requests shifted later across subsequently retired instructions and with any killed instructions removed. For good form, the post-hoc stream should probably always pair an interrupt request DII pseudo-instruction with an interrupt barrier, but as we expect to be feeding this stream into a synchronous model, it may not matter very much.

Clone this wiki locally