diff --git a/specification/src/clock-jump-differences-and-inner-sorting.md b/specification/src/clock-jump-differences-and-inner-sorting.md
index ddfb4dec0..652ad87f9 100644
--- a/specification/src/clock-jump-differences-and-inner-sorting.md
+++ b/specification/src/clock-jump-differences-and-inner-sorting.md
@@ -1,6 +1,6 @@
 # Clock Jump Differences and Inner Sorting
 
-The previous sections show how it is proven that in the JumpStack, OpStack, and RAM Tables, the regions of constant memory pointer are contiguous. The next step is to prove that within each contiguous region of constant memory pointer, the rows are sorted for clock cycle. That is the topic of this section.
+The previous sections show how it is proven that in the Jump Stack, Op Stack, and RAM Tables, the regions of constant memory pointer are contiguous. The next step is to prove that within each contiguous region of constant memory pointer, the rows are sorted for clock cycle. That is the topic of this section.
 
 The problem arises from *clock jumps*, which describes the phenomenon when the clock cycle increases even though the memory pointer does not change.
 If arbitrary jumps were allowed, nothing would prevent the cheating prover from using a table where higher rows correspond to later states, giving rise to an exploitable attack.
diff --git a/specification/src/contiguity-of-memory-pointer-regions.md b/specification/src/contiguity-of-memory-pointer-regions.md
index d4a08a5ee..fd280b062 100644
--- a/specification/src/contiguity-of-memory-pointer-regions.md
+++ b/specification/src/contiguity-of-memory-pointer-regions.md
@@ -1,15 +1,17 @@
 # Contiguity of Memory-Pointer Regions
 
-## Contiguity for OpStack Table
+## Contiguity for Op Stack Table
 
-In each cycle, the memory pointer for the OpStack Table, `osp`, can only ever increase by one, remain the same, or decrease by one. As a result, it is easy to enforce that the entire table is sorted for memory pointer using one initial constraint and one transition constraint.
+In each cycle, the memory pointer for the Op Stack Table, `op_stack_pointer`, can only ever increase by one, remain the same, or decrease by one.
+As a result, it is easy to enforce that the entire table is sorted for memory pointer using one initial constraint and one transition constraint.
 
- - Initial constraint: `osp` starts with zero, so in terms of polynomials the constraint is `osp`.
- - Transition constraint: the new `osp` is either the same as the previous or one larger. The polynomial representation for this constraint is `(osp' - osp - 1) * (osp' - osp)`.
+ - Initial constraint: `op_stack_pointer` starts with 16[^op_stack], so in terms of polynomials the constraint is `op_stack_pointer - 16`.
+ - Transition constraint: the new `op_stack_pointer` is either the same as the previous or one larger.
+    The polynomial representation for this constraint is `(op_stack_pointer' - op_stack_pointer - 1) · (op_stack_pointer' - op_stack_pointer)`.
 
-## Contiguity for JumpStack Table
+## Contiguity for Jump Stack Table
 
-Analogously to the OpStack Table, the JumpStack's memory pointer `jsp` can only ever decrease by one, remain the same, or increase by one, within each cycle. As a result, similar constraints establish that the entire table is sorted for memory pointer.
+Analogously to the Op Stack Table, the Jump Stack's memory pointer `jsp` can only ever decrease by one, remain the same, or increase by one, within each cycle. As a result, similar constraints establish that the entire table is sorted for memory pointer.
 
  - Initial constraint: `jsp` starts with zero, so in terms of polynomials the constraint is `jsp`.
  - Transition constraint: the new `jsp` is either the same as the previous or one larger. The polynomial representation for this constraint is `(jsp' - jsp - 1) * (jsp' - jsp)`.
@@ -147,3 +149,7 @@ None.
 ### Terminal
 
  - `bc0 ⋅ rpp + bc1 ⋅ fd - 1`
+
+---
+
+[^op_stack]: See [data structures](data-structures.md#operational-stack) and [registers](registers.md#stack) for explanations of the specific value 16.
diff --git a/specification/src/data-structures.md b/specification/src/data-structures.md
index 82a5f6c01..7f78774af 100644
--- a/specification/src/data-structures.md
+++ b/specification/src/data-structures.md
@@ -1,18 +1,25 @@
 # Data Structures
 
 ## Memory
+
 The term *memory* refers to a data structure that gives read access (and possibly write access, too) to elements indicated by an *address*.
 Regardless of the data structure, the address lives in the B-field.
 There are four separate notions of memory:
+
+1. *Program Memory*, from which the VM reads instructions.
+1. *Op Stack Memory*, which stores the operational stack.
 1. *RAM*, to which the VM can read and write field elements.
-2. *Program Memory*, from which the VM reads instructions.
-3. *OpStack Memory*, which stores the operational stack.
-4. *Jump Stack Memory*, which stores the entire jump stack.
+1. *Jump Stack Memory*, which stores the entire jump stack.
+
+## Program Memory
+
+Program memory holds the instructions of the program currently executed by Triton VM.
+It is immutable.
 
 ## Operational Stack
 
 The stack is a last-in;first-out data structure that allows the program to store intermediate variables, pass arguments, and keep pointers to objects held in RAM.
-In this document, the operational stack is either referred to as just “stack” or, if more clarity is desired, “OpStack.”
+In this document, the operational stack is either referred to as just “stack” or, if more clarity is desired, “op stack.”
 
 From the Virtual Machine's point of view, the stack is a single, continuous object.
 The first 16 elements of the stack can be accessed very conveniently.
@@ -20,9 +27,9 @@ Elements deeper in the stack require removing some of the higher elements, possi
 
 For reasons of arithmetization, the stack is actually split into two distinct parts:
 1. the _operational stack registers_ `st0` through `st15`, and
-1. the _OpStack Underflow Memory_.
+1. the _Op Stack Underflow Memory_.
 
-The motivation and the interplay between the two parts is described and exemplified in [arithmetization of the OpStack table](operational-stack-table.md).
+The motivation and the interplay between the two parts is described and exemplified in [arithmetization of the Op Stack table](operational-stack-table.md).
 
 ## Random Access Memory
 
@@ -36,8 +43,10 @@ This initialization is one form of secret input, and one of two mechanisms that
 The other mechanism is [dedicated instructions](instructions.md#opstack-manipulation).
 
 ## Jump Stack
+
 Another last-in;first-out data structure that keeps track of return and destination addresses.
 This stack changes only when control follows a `call` or `return` instruction.
+Furthermore, executing instruction `recurse` requires a non-empty jump stack.
 
 ---
 
diff --git a/specification/src/hash-table.md b/specification/src/hash-table.md
index 61c6ebb21..de0f5f6ae 100644
--- a/specification/src/hash-table.md
+++ b/specification/src/hash-table.md
@@ -1,6 +1,6 @@
 # Hash Table
 
-The instruction `hash` hashes the OpStack's 10 top-most elements in one cycle.
+The instruction `hash` hashes the Op Stack's 10 top-most elements in one cycle.
 Similarly, the Sponge instructions `sponge_init`, `sponge_absorb`, and `sponge_squeeze` also all complete in one cycle.
 The main processor achieves this by using a hash coprocessor.
 The Hash Table is part of the arithmetization of that coprocessor, the other two parts being the [Cascade Table](cascade-table.md) and the [Lookup Table](lookup-table.md).
diff --git a/specification/src/img/aet-relations.ipe b/specification/src/img/aet-relations.ipe
index 572890f46..9d60e4d16 100644
--- a/specification/src/img/aet-relations.ipe
+++ b/specification/src/img/aet-relations.ipe
@@ -1,7 +1,7 @@
 <?xml version="1.0"?>
 <!DOCTYPE ipe SYSTEM "ipe.dtd">
 <ipe version="70218" creator="Ipe 7.2.24">
-<info created="D:20200729150742" modified="D:20230627105938"/>
+<info created="D:20200729150742" modified="D:20231024132138"/>
 <preamble>\usepackage{lmodern}
 \renewcommand*\familydefault{\sfdefault}
 \usepackage[T1]{fontenc}</preamble>
@@ -354,7 +354,7 @@ h
 236 128 c
 216 128 l
 </path>
-<path layer="perm_args" stroke="darkblue" pen="heavier" cap="1" join="1" arrow="pointed/small" rarrow="pointed/small">
+<path layer="perm_args" stroke="darkblue" pen="heavier" cap="1" join="1" arrow="pointed/small" rarrow="farc/small">
 180 124 m
 180 108 l
 180 104
diff --git a/specification/src/img/aet-relations.png b/specification/src/img/aet-relations.png
index d810adae7..1f415274a 100644
Binary files a/specification/src/img/aet-relations.png and b/specification/src/img/aet-relations.png differ
diff --git a/specification/src/img/program-attestation.png b/specification/src/img/program-attestation.png
index a872f48a3..5baf0554a 100644
Binary files a/specification/src/img/program-attestation.png and b/specification/src/img/program-attestation.png differ
diff --git a/specification/src/instruction-groups.md b/specification/src/instruction-groups.md
index 46e2d1121..72429fa6b 100644
--- a/specification/src/instruction-groups.md
+++ b/specification/src/instruction-groups.md
@@ -197,8 +197,8 @@ Contains all constraints from instruction group `keep_jump_stack`, and additiona
 1. The stack element in `st12` is moved into `st13`.
 1. The stack element in `st13` is moved into `st14`.
 1. The stack element in `st14` is moved into `st15`.
-1. The stack element in `st15` is moved to the top of OpStack underflow, i.e., `osv`.
-1. The OpStack pointer is incremented by 1.
+1. The op stack pointer is incremented by 1.
+1. The running product for the Op Stack Table absorbs the current row with respect to challenges 🍋, 🍊, 🍉, and 🫒 and indeterminate 🪤.
 
 ### Polynomials
 
@@ -216,8 +216,8 @@ Contains all constraints from instruction group `keep_jump_stack`, and additiona
 1. `st13' - st12`
 1. `st14' - st13`
 1. `st15' - st14`
-1. `osv' - st15`
-1. `osp' - (osp + 1)`
+1. `op_stack_pointer' - (op_stack_pointer + 1)`
+1. `RunningProductOpStackTable' - RunningProductOpStackTable·(🪤 - 🍋·clk - 🍊·ib1 - 🍉·op_stack_pointer - 🫒·st15)`
 
 ## Group `grow_stack`
 
@@ -240,8 +240,8 @@ Contains all constraints from instruction group `stack_grows_and_top_2_unconstra
 1. The stack element in `st13` does not change.
 1. The stack element in `st14` does not change.
 1. The stack element in `st15` does not change.
-1. The top of the OpStack underflow, i.e., `osv`, does not change.
-1. The OpStack pointer does not change.
+1. The op stack pointer does not change.
+1. The running product for the Op Stack Table remains unchanged.
 
 ### Polynomials
 
@@ -250,8 +250,8 @@ Contains all constraints from instruction group `stack_grows_and_top_2_unconstra
 1. `st13' - st13`
 1. `st14' - st14`
 1. `st15' - st15`
-1. `osv' - osv`
-1. `osp' - osp`
+1. `op_stack_pointer' - op_stack_pointer`
+1. `RunningProductOpStackTable' - RunningProductOpStackTable`
 
 ## Group `stack_remains_and_top_10_unconstrained`
 
@@ -317,8 +317,8 @@ Contains all constraints from instruction group `unary_operation`, and additiona
 
 ## Group `stack_shrinks_and_top_3_unconstrained`
 
-This instruction group requires helper variable `hv0` to hold the multiplicative inverse of `(osp - 16)`.
-In effect, this means that the OpStack pointer can only be decremented if it is not 16, i.e., if OpStack Underflow Memory is not empty.
+This instruction group requires helper variable `hv0` to hold the multiplicative inverse of `(op_stack_pointer - 16)`.
+In effect, this means that the op stack pointer can only be decremented if it is not 16, i.e., if op stack underflow memory is not empty.
 Since the stack can only change by one element at a time, this prevents stack underflow.
 
 ### Description
@@ -335,9 +335,9 @@ Since the stack can only change by one element at a time, this prevents stack un
 1. The stack element in `st13` is moved into `st12`.
 1. The stack element in `st14` is moved into `st13`.
 1. The stack element in `st15` is moved into `st14`.
-1. The stack element at the top of OpStack underflow, i.e., `osv`, is moved into `st15`.
-1. The OpStack pointer is decremented by 1.
-1. The helper variable register `hv0` holds the inverse of `(osp - 16)`.
+1. The op stack pointer is decremented by 1.
+1. The helper variable register `hv0` holds the inverse of `(op_stack_pointer - 16)`.
+1. The running product for the Op Stack Table absorbs clock cycle and instruction bit 1 from the current row as well as op stack pointer and stack element 15 from the next row with respect to challenges 🍋, 🍊, 🍉, and 🫒 and indeterminate 🪤.
 
 ### Polynomials
 
@@ -353,9 +353,9 @@ Since the stack can only change by one element at a time, this prevents stack un
 1. `st12' - st13`
 1. `st13' - st14`
 1. `st14' - st15`
-1. `st15' - osv`
-1. `osp' - (osp - 1)`
-1. `(osp - 16)·hv0 - 1`
+1. `op_stack_pointer' - (op_stack_pointer - 1)`
+1. `(op_stack_pointer - 16)·hv0 - 1`
+1. `RunningProductOpStackTable' - RunningProductOpStackTable·(🪤 - 🍋·clk - 🍊·ib1 - 🍉·op_stack_pointer' - 🫒·st15')`
 
 ## Group `binary_operation`
 
diff --git a/specification/src/instruction-specific-transition-constraints.md b/specification/src/instruction-specific-transition-constraints.md
index 8c04cc081..d7a8fb708 100644
--- a/specification/src/instruction-specific-transition-constraints.md
+++ b/specification/src/instruction-specific-transition-constraints.md
@@ -134,8 +134,8 @@ Additionally, it defines the following transition constraints.
 1. If `i` is not 13, then `st13` does not change.
 1. If `i` is not 14, then `st14` does not change.
 1. If `i` is not 15, then `st15` does not change.
-1. The top of the OpStack underflow, i.e., `osv`, does not change.
-1. The OpStack pointer does not change.
+1. The op stack pointer does not change.
+1. The running product for the Op Stack Table remains unchanged.
 
 ### Polynomials
 
@@ -185,8 +185,8 @@ Additionally, it defines the following transition constraints.
 1. `(1 - ind_13(hv3, hv2, hv1, hv0))·(st13' - st13)`
 1. `(1 - ind_14(hv3, hv2, hv1, hv0))·(st14' - st14)`
 1. `(1 - ind_15(hv3, hv2, hv1, hv0))·(st15' - st15)`
-1. `osv' - osv`
-1. `osp' - osp`
+1. `op_stack_pointer' - op_stack_pointer`
+1. `RunningProductOpStackTable' - RunningProductOpStackTable`
 
 ### Helper variable definitions for `swap` + `i`
 
diff --git a/specification/src/instructions.md b/specification/src/instructions.md
index 185661981..d6427dee1 100644
--- a/specification/src/instructions.md
+++ b/specification/src/instructions.md
@@ -19,9 +19,9 @@ The second property helps guarantee that operational stack underflow cannot happ
 It is used by several instructions through instruction group [`stack_shrinks_and_top_3_unconstrained`](instruction-groups.md#group-stack_shrinks_and_top_3_unconstrained).
 The third property allows efficient arithmetization of the running product for the Permutation Argument between [Processor Table](processor-table.md) and [U32 Table](u32-table.md).
 
-## OpStack Manipulation
+## Op Stack Manipulation
 
-| Instruction  | Opcode | old OpStack         | new OpStack           | Description                                                                      |
+| Instruction  | Opcode | old op stack        | new op stack          | Description                                                                      |
 |:-------------|-------:|:--------------------|:----------------------|:---------------------------------------------------------------------------------|
 | `pop`        |      2 | `_ a`               | `_`                   | Pops top element from stack.                                                     |
 | `push` + `a` |      1 | `_`                 | `_ a`                 | Pushes `a` onto the stack.                                                       |
@@ -42,26 +42,26 @@ the value `a` was supplied as a secret input.
 
 ## Control Flow
 
-| Instruction  | Opcode | old OpStack | new OpStack | old `ip` | new `ip` | old JumpStack | new JumpStack | Description                                                                                                              |
-|:-------------|-------:|:------------|:------------|:---------|:---------|:--------------|:--------------|:-------------------------------------------------------------------------------------------------------------------------|
-| `nop`        |     16 | `_`         | `_`         | `_`      | `_ + 1`  | `_`           | `_`           | Do nothing                                                                                                               |
-| `skiz`       |     10 | `_ a`       | `_`         | `_`      | `_ + s`  | `_`           | `_`           | Skip next instruction if `a` is zero. `s` ∈ {1, 2, 3} depends on `a` and whether the next instruction takes an argument. |
-| `call` + `d` |     25 | `_`         | `_`         | `o`      | `d`      | `_`           | `_ (o+2, d)`  | Push `(o+2,d)` to the jump stack, and jump to absolute address `d`                                                       |
-| `return`     |     24 | `_`         | `_`         | `_`      | `o`      | `_ (o, d)`    | `_`           | Pop one pair off the jump stack and jump to that pair's return address (which is the first element).                     |
-| `recurse`    |     32 | `_`         | `_`         | `_`      | `d`      | `_ (o, d)`    | `_ (o, d)`    | Peek at the top pair of the jump stack and jump to that pair's destination address (which is the second element).        |
-| `assert`     |     18 | `_ a`       | `_`         | `_`      | `_ + 1`  | `_`           | `_`           | Pops `a` if `a == 1`, else crashes the virtual machine.                                                                  |
-| `halt`       |      0 | `_`         | `_`         | `_`      | `_ + 1`  | `_`           | `_`           | Solves the halting problem (if the instruction is reached). Indicates graceful shutdown of the VM.                       |
+| Instruction  | Opcode | old op stack | new op stack | old `ip` | new `ip` | old jump stack | new jump stack | Description                                                                                                              |
+|:-------------|-------:|:-------------|:-------------|:---------|:---------|:---------------|:---------------|:-------------------------------------------------------------------------------------------------------------------------|
+| `nop`        |     16 | `_`          | `_`          | `_`      | `_ + 1`  | `_`            | `_`            | Do nothing                                                                                                               |
+| `skiz`       |     10 | `_ a`        | `_`          | `_`      | `_ + s`  | `_`            | `_`            | Skip next instruction if `a` is zero. `s` ∈ {1, 2, 3} depends on `a` and whether the next instruction takes an argument. |
+| `call` + `d` |     25 | `_`          | `_`          | `o`      | `d`      | `_`            | `_ (o+2, d)`   | Push `(o+2,d)` to the jump stack, and jump to absolute address `d`                                                       |
+| `return`     |     24 | `_`          | `_`          | `_`      | `o`      | `_ (o, d)`     | `_`            | Pop one pair off the jump stack and jump to that pair's return address (which is the first element).                     |
+| `recurse`    |     32 | `_`          | `_`          | `_`      | `d`      | `_ (o, d)`     | `_ (o, d)`     | Peek at the top pair of the jump stack and jump to that pair's destination address (which is the second element).        |
+| `assert`     |     18 | `_ a`        | `_`          | `_`      | `_ + 1`  | `_`            | `_`            | Pops `a` if `a == 1`, else crashes the virtual machine.                                                                  |
+| `halt`       |      0 | `_`          | `_`          | `_`      | `_ + 1`  | `_`            | `_`            | Solves the halting problem (if the instruction is reached). Indicates graceful shutdown of the VM.                       |
 
 ## Memory Access
 
-| Instruction | Opcode | old OpStack | new OpStack | old `ramv` | new `ramv` | Description                                                                 |
-|:------------|-------:|:------------|:------------|:-----------|:-----------|:----------------------------------------------------------------------------|
-| `read_mem`  |     40 | `_ p`       | `_ p v`     | `v`        | `v`        | Reads value `v` from RAM at address `p` and pushes `v` onto the OpStack.    |
-| `write_mem` |     26 | `_ p v`     | `_ p`       | `_`        | `v`        | Writes OpStack's top-most value `v` to RAM at the address `p`, popping `v`. |
+| Instruction | Opcode | old op stack | new op stack | old `ramv` | new `ramv` | Description                                                                  |
+|:------------|-------:|:-------------|:-------------|:-----------|:-----------|:-----------------------------------------------------------------------------|
+| `read_mem`  |     40 | `_ p`        | `_ p v`      | `v`        | `v`        | Reads value `v` from RAM at address `p` and pushes `v` onto the op stack.    |
+| `write_mem` |     26 | `_ p v`      | `_ p`        | `_`        | `v`        | Writes op stack's top-most value `v` to RAM at the address `p`, popping `v`. |
 
 ## Hashing
 
-| Instruction      | Opcode | old OpStack     | new OpStack                   | Description                                                                                             |
+| Instruction      | Opcode | old op stack    | new op stack                  | Description                                                                                             |
 |:-----------------|-------:|:----------------|:------------------------------|:--------------------------------------------------------------------------------------------------------|
 | `hash`           |     48 | `_jihgfedcba`   | `_yxwvu00000`                 | Overwrites the stack's 10 top-most elements with their hash digest (length 5) and 5 zeros.              |
 | `divine_sibling` |     56 | `_ iedcba*****` | e.g., `_ (i div 2)edcbazyxwv` | Helps traversing a Merkle tree during authentication path verification. See extended description below. |
@@ -98,29 +98,29 @@ Triton VM cannot know the number of elements that will be absorbed.
 
 ## Base Field Arithmetic on Stack
 
-| Instruction | Opcode | old OpStack | new OpStack  | Description                                                                                                                |
-|:------------|-------:|:------------|:-------------|:---------------------------------------------------------------------------------------------------------------------------|
-| `add`       |     34 | `_ b a`     | `_ c`        | Computes the sum (`c`) of the top two elements of the stack (`b` and `a`) over the field.                                  |
-| `mul`       |     42 | `_ b a`     | `_ c`        | Computes the product (`c`) of the top two elements of the stack (`b` and `a`) over the field.                              |
-| `invert`    |     96 | `_ a`       | `_ b`        | Computes the multiplicative inverse (over the field) of the top of the stack. Crashes the VM if the top of the stack is 0. |
-| `eq`        |     50 | `_ b a`     | `_ (a == b)` | Tests the top two stack elements for equality.                                                                             |
+| Instruction | Opcode | old op stack | new op stack | Description                                                                                                                |
+|:------------|-------:|:-------------|:-------------|:---------------------------------------------------------------------------------------------------------------------------|
+| `add`       |     34 | `_ b a`      | `_ c`        | Computes the sum (`c`) of the top two elements of the stack (`b` and `a`) over the field.                                  |
+| `mul`       |     42 | `_ b a`      | `_ c`        | Computes the product (`c`) of the top two elements of the stack (`b` and `a`) over the field.                              |
+| `invert`    |     96 | `_ a`        | `_ b`        | Computes the multiplicative inverse (over the field) of the top of the stack. Crashes the VM if the top of the stack is 0. |
+| `eq`        |     50 | `_ b a`      | `_ (a == b)` | Tests the top two stack elements for equality.                                                                             |
 
 ## Bitwise Arithmetic on Stack
 
-| Instruction   | Opcode | old OpStack | new OpStack   | Description                                                                                                                                                                |
-|:--------------|-------:|:------------|:--------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `split`       |      4 | `_ a`       | `_ hi lo`     | Decomposes the top of the stack into the lower 32 bits and the upper 32 bits.                                                                                              |
-| `lt`          |      6 | `_ b a`     | `_ a<b`       | “Less than” of the stack's two top-most elements. Crashes the VM if `a` or `b` is not u32.                                                                                 |
-| `and`         |     14 | `_ b a`     | `_ a&b`       | Bitwise and of the stack's two top-most elements. Crashes the VM if `a` or `b` is not u32.                                                                                 |
-| `xor`         |     22 | `_ b a`     | `_ a^b`       | Bitwise exclusive or of the stack's two top-most elements. Crashes the VM if `a` or `b` is not u32.                                                                        |
-| `log_2_floor` |     12 | `_ a`       | `_ ⌊log₂(a)⌋` | The number of bits in `a` minus 1, _i.e._, $\lfloor\log_2\texttt{a}\rfloor$. Crashes the VM if `a` is 0 or not u32.                                                        |
-| `pow`         |     30 | `_ e b`     | `_ b**e`      | The top of the stack to the power of the stack's runner up. Crashes the VM if exponent `e` is not u32.                                                                     |
-| `div_mod`     |     20 | `_ d n`     | `_ q r`       | Division with remainder of numerator `n` by denominator `d`. Guarantees the properties `n == q·d + r` and `r < d`. Crashes the VM if `n` or `d` is not u32 or if `d` is 0. |
-| `pop_count`   |     28 | `_ a`       | `_ w`         | Computes the [hamming weight](https://en.wikipedia.org/wiki/Hamming_weight) or “population count” of `a`. Crashes the VM if `a` is not u32.                                |
+| Instruction   | Opcode | old op stack | new op stack  | Description                                                                                                                                                                |
+|:--------------|-------:|:-------------|:--------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `split`       |      4 | `_ a`        | `_ hi lo`     | Decomposes the top of the stack into the lower 32 bits and the upper 32 bits.                                                                                              |
+| `lt`          |      6 | `_ b a`      | `_ a<b`       | “Less than” of the stack's two top-most elements. Crashes the VM if `a` or `b` is not u32.                                                                                 |
+| `and`         |     14 | `_ b a`      | `_ a&b`       | Bitwise and of the stack's two top-most elements. Crashes the VM if `a` or `b` is not u32.                                                                                 |
+| `xor`         |     22 | `_ b a`      | `_ a^b`       | Bitwise exclusive or of the stack's two top-most elements. Crashes the VM if `a` or `b` is not u32.                                                                        |
+| `log_2_floor` |     12 | `_ a`        | `_ ⌊log₂(a)⌋` | The number of bits in `a` minus 1, _i.e._, $\lfloor\log_2\texttt{a}\rfloor$. Crashes the VM if `a` is 0 or not u32.                                                        |
+| `pow`         |     30 | `_ e b`      | `_ b**e`      | The top of the stack to the power of the stack's runner up. Crashes the VM if exponent `e` is not u32.                                                                     |
+| `div_mod`     |     20 | `_ d n`      | `_ q r`       | Division with remainder of numerator `n` by denominator `d`. Guarantees the properties `n == q·d + r` and `r < d`. Crashes the VM if `n` or `d` is not u32 or if `d` is 0. |
+| `pop_count`   |     28 | `_ a`        | `_ w`         | Computes the [hamming weight](https://en.wikipedia.org/wiki/Hamming_weight) or “population count” of `a`. Crashes the VM if `a` is not u32.                                |
 
 ## Extension Field Arithmetic on Stack
 
-| Instruction | Opcode | old OpStack     | new OpStack     | Description                                                                                                                                                  |
+| Instruction | Opcode | old op stack    | new op stack    | Description                                                                                                                                                  |
 |:------------|-------:|:----------------|:----------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | `xxadd`     |    104 | `_ z y x b c a` | `_ z y x w v u` | Adds the two extension field elements encoded by field elements `z y x` and `b c a`, overwriting the top-most extension field element with the result.       |
 | `xxmul`     |    112 | `_ z y x b c a` | `_ z y x w v u` | Multiplies the two extension field elements encoded by field elements `z y x` and `b c a`, overwriting the top-most extension field element with the result. |
@@ -129,7 +129,7 @@ Triton VM cannot know the number of elements that will be absorbed.
 
 ## Input/Output
 
-| Instruction | Opcode | old OpStack | new OpStack | Description                                                             |
-|:------------|-------:|:------------|:------------|:------------------------------------------------------------------------|
-| `read_io`   |    128 | `_`         | `_ a`       | Reads a B-Field element from standard input and pushes it to the stack. |
-| `write_io`  |     66 | `_ a`       | `_`         | Pops `a` from the stack and writes it to standard output.               |
+| Instruction | Opcode | old op stack | new op stack | Description                                                             |
+|:------------|-------:|:-------------|:-------------|:------------------------------------------------------------------------|
+| `read_io`   |    128 | `_`          | `_ a`        | Reads a B-Field element from standard input and pushes it to the stack. |
+| `write_io`  |     66 | `_ a`        | `_`          | Pops `a` from the stack and writes it to standard output.               |
diff --git a/specification/src/jump-stack-table.md b/specification/src/jump-stack-table.md
index 2e293527b..87abff285 100644
--- a/specification/src/jump-stack-table.md
+++ b/specification/src/jump-stack-table.md
@@ -125,7 +125,7 @@ In total, above steps ensure that the Permutation Argument between the Jump Stac
 Memory-consistency follows from two more primitive properties:
 
 1. Contiguity of regions of constant memory pointer.
-  Since the memory pointer for the JumpStack table, `jsp` can change by at most one per cycle, it is possible to enforce a full sorting using AIR constraints.
+  Since the memory pointer for the Jump Stack table, `jsp` can change by at most one per cycle, it is possible to enforce a full sorting using AIR constraints.
 2. Correct inner-sorting within contiguous regions.
   Specifically, the rows within each contiguous region of constant memory pointer should be sorted for clock cycle.
   This property is established by the clock jump difference [Lookup Argument](lookup-argument.md).
diff --git a/specification/src/memory-consistency.md b/specification/src/memory-consistency.md
index 2acd2f486..954331835 100644
--- a/specification/src/memory-consistency.md
+++ b/specification/src/memory-consistency.md
@@ -1,11 +1,11 @@
 # Memory Consistency
 
-Triton has three memory-like units: the RAM, the JumpStack, and the OpStack. Each such unit has a corresponding table. *Memory-consistency* is the property that whenever the processor reads a data element from these units, the value that it receives corresponds to the value sent the last time when that cell was written. Memory consistency follows from two intermediate properties.
+Triton has three memory-like units: the RAM, the jump stack, and the op stack. Each such unit has a corresponding table. *Memory-consistency* is the property that whenever the processor reads a data element from these units, the value that it receives corresponds to the value sent the last time when that cell was written. Memory consistency follows from two intermediate properties.
 
  1. Contiguity of regions of constant memory pointer. After selecting from each table all the rows with any given memory pointer, the resulting sublist is contiguous, which is to say, there are no gaps and two such sublists can never interleave.
  2. Inner-sorting within contiguous regions. Within each such contiguous region, the rows are sorted in ascending order by clock cycle.
 
-The contiguity of regions of constant memory pointer is established differently for the RAM Table than for the OpStack or JumpStack Tables. The OpStack and JumpStack Tables enjoy a particular property whereby the memory pointer can only ever increase or decrease by one (or stay the same). As a result, simple AIR constraints can enforce the correct sorting by memory pointer. In contrast, the memory pointer for the RAM Table can jump arbitrarily.
+The contiguity of regions of constant memory pointer is established differently for the RAM Table than for the Op Stack or Jump Stack Tables. The Op Stack and Jump Stack Tables enjoy a particular property whereby the memory pointer can only ever increase or decrease by one (or stay the same). As a result, simple AIR constraints can enforce the correct sorting by memory pointer. In contrast, the memory pointer for the RAM Table can jump arbitrarily.
 As explained in the [next section](contiguity-of-memory-pointer-regions.md), a Contiguity Argument establishes contiguity.
 
 The correct inner sorting is established the same way for all three memory-like tables.
diff --git a/specification/src/operational-stack-table.md b/specification/src/operational-stack-table.md
index 48445b31f..0b4b2f5b9 100644
--- a/specification/src/operational-stack-table.md
+++ b/specification/src/operational-stack-table.md
@@ -1,97 +1,104 @@
 # Operational Stack Table
 
-The operational stack is where the program stores simple elementary operations, function arguments, and pointers to important objects.
+The [operational stack](data-structures.md#operational-stack)[^abbrev] is where the program stores simple elementary operations, function arguments, and pointers to important objects.
 There are 16 registers (`st0` through `st15`) that the program can access directly.
 These registers correspond to the top of the stack.
 They are recorded in the [Processor Table](processor-table.md).
 
-The rest of the operational stack is stored in a dedicated memory object called “operational stack underflow memory”.
+The rest of the op stack is stored in a dedicated memory object called “operational stack underflow memory”.
 It is initially empty.
-The evolution of the underflow memory is recorded in the Operational Stack Table.
+The evolution of the underflow memory is recorded in the Op Stack Table.
+The sole task of the Op Stack Table is to keep underflow memory immutable.
+To achieve this, any read or write accesses to the underflow memory are recorded in the Op Stack Table.
+Read and write accesses to op stack underflow memory are a side effect of shrinking or growing the op stack.
 
 ## Base Columns
 
-The Operational Stack Table consists of 4 columns:
+The Op Stack Table consists of 4 columns:
 1. the cycle counter `clk`
-1. the shrink stack indicator `shrink_stack`
-1. the operational stack value `osv`, and
-1. the operational stack pointer `osp`.
+1. the shrink stack indicator `shrink_stack`, corresponding to the processor's instruction bit 1 `ib1`,
+1. the op stack pointer `stack_pointer`, and
+1. the first underflow element `first_underflow_element`.
 
-| Clock | Shrink Stack Indicator | Op Stack Pointer | Op Stack Value |
-|:------|:-----------------------|:-----------------|:---------------|
-| -     | -                      | -                | -              |
+| Clock | Shrink Stack Indicator | Stack Pointer | First Underflow Element |
+|:------|:-----------------------|:--------------|:------------------------|
+| -     | -                      | -             | -                       |
 
-Columns `clk`, `shrink_stack`, `osp`, and `osv` correspond to columns `clk`, `ib1`, `osp`, and `osv` in the [Processor Table](processor-table.md), respectively.
-A [Permutation Argument](permutation-argument.md) with the Processor Table establishes that, selecting the columns with these labels, the two tables' sets of rows are identical.
+Column `clk` records the processor's execution cycle during which the read or write access happens.
+The `shrink_stack` indicator signals whether the underflow memory access is a read or a write:
+a read corresponds to a shrinking stack is indicated by a 1, a write corresponds to a growing stack and is indicated by a 0.
+The same column doubles up as a [padding indicator](#padding), in which case `shrink_stack` is set to 2.
+The `stack_pointer` is the address at which the to-be-read-or-written element resides.
+Finally, the `first_underflow_element` is the stack element being transferred from stack register `st15` into underflow memory on a write, or the other way around on a read.
 
-In order to guarantee [memory consistency](memory-consistency.md), the rows of the operational stack table are sorted by operational stack pointer `osp` first, cycle count `clk` second.
+A [subset Permutation Argument](permutation-argument.md) with the [Processor Table](processor-table.md) establishes that the rows recorded in the Op Stack Table are consistent with the processor's part of the op stack.
+
+In order to guarantee [memory consistency](memory-consistency.md), the rows of the operational stack table are sorted by `stack_pointer` first, cycle count `clk` second.
 The mechanics are best illustrated by an example.
 Observe how the malicious manipulation of the Op Stack Underflow Memory, the change of “42” into “99” in cycle 8, is detected:
-The transition constraints of the Op Stack Table stipulate that if `osp` does not change, then `osv` can only change if the shrink stack indicator `shrink_stack` is set.
-Consequently, row `[5, push, 9, 42]` being followed by row `[_, _, 9, 99]` violates the constraints.
+The transition constraints of the Op Stack Table stipulate that if `stack_pointer` does not change, then the `first_underflow_element` can only change if the next instruction grows the stack.
+Consequently, row `[4, 0, 8, 42]` being followed by row `[10, 1, 8, 99]` violates the constraints.
 The shrink stack indicator being correct is guaranteed by the Permutation Argument between Op Stack Table and the [Processor Table](processor-table.md).
 
 For illustrative purposes only, we use four stack registers `st0` through `st3` in the example.
-TritonVM has 16 stack registers, `st0` through `st15`.
-
-Execution trace:
-
-| `clk` | `ci`   | `nia`  | `st0` | `st1` | `st2` | `st3` | Op Stack Underflow Memory | `osp` | `osv` |
-|------:|:-------|:-------|------:|------:|------:|------:|:--------------------------|------:|------:|
-|     0 | `push` | 42     |     0 |     0 |     0 |     0 | [ ]                       |     4 |     0 |
-|     1 | `push` | 43     |    42 |     0 |     0 |     0 | [0]                       |     5 |     0 |
-|     2 | `push` | 44     |    43 |    42 |     0 |     0 | [0,0]                     |     6 |     0 |
-|     3 | `push` | 45     |    44 |    43 |    42 |     0 | [0,0,0]                   |     7 |     0 |
-|     4 | `push` | 46     |    45 |    44 |    43 |    42 | [0,0,0,0]                 |     8 |     0 |
-|     5 | `push` | 47     |    46 |    45 |    44 |    43 | [42,0,0,0,0]              |     9 |    42 |
-|     6 | `push` | 48     |    47 |    46 |    45 |    44 | [43,42,0,0,0,0]           |    10 |    43 |
-|     7 | `nop`  | `pop`  |    48 |    47 |    46 |    45 | [44,43,42,0,0,0,0]        |    11 |    44 |
-|     8 | `pop`  | `pop`  |    48 |    47 |    46 |    45 | [44,43,99,0,0,0,0]        |    11 |    44 |
-|     9 | `pop`  | `pop`  |    47 |    46 |    45 |    44 | [43,99,0,0,0,0]           |    10 |    43 |
-|    10 | `pop`  | `pop`  |    46 |    45 |    44 |    43 | [99,0,0,0,0]              |     9 |    99 |
-|    11 | `pop`  | `push` |    45 |    44 |    43 |    99 | [0,0,0,0]                 |     8 |     0 |
-|    12 | `push` | 77     |    44 |    43 |    99 |     0 | [0,0,0]                   |     7 |     0 |
-|    13 | `swap` | 4      |    77 |    44 |    43 |    99 | [0,0,0,0]                 |     8 |     0 |
-|    14 | `push` | 78     |    99 |    44 |    43 |    77 | [0,0,0,0]                 |     8 |     0 |
-|    15 | `swap` | 4      |    78 |    99 |    44 |    43 | [77,0,0,0,0]              |     9 |    77 |
-|    16 | `push` | 79     |    43 |    99 |    44 |    78 | [77,0,0,0,0]              |     9 |    77 |
-|    17 | `pop`  | `pop`  |    79 |    43 |    99 |    44 | [78,77,0,0,0,0]           |    10 |    78 |
-|    18 | `pop`  | `pop`  |    43 |    99 |    44 |    78 | [77,0,0,0,0]              |     9 |    77 |
-|    19 | `pop`  | `pop`  |    99 |    44 |    78 |    77 | [0,0,0,0]                 |     8 |     0 |
-|    20 | `pop`  | `pop`  |    44 |    78 |    77 |     0 | [0,0,0]                   |     7 |     0 |
-|    21 | `pop`  | `pop`  |    78 |    77 |     0 |     0 | [0,0]                     |     6 |     0 |
-|    22 | `pop`  | `pop`  |    77 |     0 |     0 |     0 | [0]                       |     5 |     0 |
-|    23 | `pop`  | `pop`  |     0 |     0 |     0 |     0 | [ ]                       |     4 |     0 |
-|    24 | `pop`  | `nop`  |     0 |     0 |     0 |     0 | 💥                         |     3 |     0 |
+Triton VM has 16 stack registers, `st0` through `st15`.
+Furthermore, implied next instructions usually recorded in register “next instruction or argument” `nia` are omitted for reasons of readability.
+
+Processor's execution trace:
+
+| `clk` | `ci` | `nia` | `st0` | `st1` | `st2` | `st3` | Op Stack Underflow Memory | `op_stack_pointer` |
+|------:|:-----|------:|------:|------:|------:|------:|:--------------------------|-------------------:|
+|     0 | push |    42 |     0 |     0 |     0 |     0 | []                        |                  4 |
+|     1 | push |    43 |    42 |     0 |     0 |     0 | [ 0]                      |                  5 |
+|     2 | push |    44 |    43 |    42 |     0 |     0 | [ 0,  0]                  |                  6 |
+|     3 | push |    45 |    44 |    43 |    42 |     0 | [ 0,  0,  0]              |                  7 |
+|     4 | push |    46 |    45 |    44 |    43 |    42 | [ 0,  0,  0,  0]          |                  8 |
+|     5 | push |    47 |    46 |    45 |    44 |    43 | [42,  0,  0,  0,  0]      |                  9 |
+|     6 | push |    48 |    47 |    46 |    45 |    44 | [43, 42,  0,  0,  0,  0]  |                 10 |
+|     7 | nop  |       |    48 |    47 |    46 |    45 | [44, 43, 42,  0,  0,  0]  |                 11 |
+|     8 | pop  |       |    48 |    47 |    46 |    45 | [44, 43, 99,  0,  0,  0]  |                 11 |
+|     9 | pop  |       |    47 |    46 |    45 |    44 | [43, 99,  0,  0,  0,  0]  |                 10 |
+|    10 | pop  |       |    46 |    45 |    44 |    43 | [99,  0,  0,  0,  0]      |                  9 |
+|    11 | pop  |       |    45 |    44 |    43 |    99 | [ 0,  0,  0,  0]          |                  8 |
+|    12 | push |    77 |    44 |    43 |    99 |     0 | [ 0,  0,  0]              |                  7 |
+|    13 | swap |     3 |    77 |    44 |    43 |    99 | [ 0,  0,  0,  0]          |                  8 |
+|    14 | push |    78 |    99 |    44 |    43 |    77 | [ 0,  0,  0,  0]          |                  8 |
+|    15 | swap |     3 |    78 |    99 |    44 |    43 | [77,  0,  0,  0,  0]      |                  9 |
+|    16 | push |    79 |    43 |    99 |    44 |    78 | [77,  0,  0,  0,  0]      |                  9 |
+|    17 | pop  |       |    79 |    43 |    99 |    44 | [78, 77,  0,  0,  0,  0]  |                 10 |
+|    18 | pop  |       |    43 |    99 |    44 |    78 | [77,  0,  0,  0,  0]      |                  9 |
+|    19 | pop  |       |    99 |    44 |    78 |    77 | [ 0,  0,  0,  0]          |                  8 |
+|    20 | pop  |       |    44 |    78 |    77 |     0 | [ 0,  0,  0]              |                  7 |
+|    21 | pop  |       |    78 |    77 |     0 |     0 | [ 0,  0]                  |                  6 |
+|    22 | pop  |       |    77 |     0 |     0 |     0 | [ 0]                      |                  5 |
+|    23 | halt |       |     0 |     0 |     0 |     0 | []                        |                  4 |
+
 
 Operational Stack Table:
 
-| `clk` | `shrink_stack` | (comment) | `osp` | `osv` |
-|------:|---------------:|:----------|------:|------:|
-|     0 |              0 | (`push`)  |     4 |     0 |
-|    23 |              1 | (`pop`)   |     4 |     0 |
-|     1 |              0 | (`push`)  |     5 |     0 |
-|    22 |              1 | (`pop`)   |     5 |     0 |
-|     2 |              0 | (`push`)  |     6 |     0 |
-|    21 |              1 | (`pop`)   |     6 |     0 |
-|     3 |              0 | (`push`)  |     7 |     0 |
-|    12 |              0 | (`push`)  |     7 |     0 |
-|    20 |              1 | (`pop`)   |     7 |     0 |
-|     4 |              0 | (`push`)  |     8 |     0 |
-|    11 |              1 | (`pop`)   |     8 |     0 |
-|    13 |              0 | (`swap`)  |     8 |     0 |
-|    14 |              0 | (`push`)  |     8 |     0 |
-|    19 |              1 | (`pop`)   |     8 |     0 |
-|     5 |              0 | (`push`)  |     9 |    42 |
-|    10 |              1 | (`pop`)   |     9 |    99 |
-|    15 |              0 | (`swap`)  |     9 |    77 |
-|    16 |              0 | (`push`)  |     9 |    77 |
-|    18 |              1 | (`pop`)   |     9 |    77 |
-|     6 |              0 | (`push`)  |    10 |    43 |
-|     9 |              1 | (`pop`)   |    10 |    43 |
-|    17 |              1 | (`pop`)   |    10 |    78 |
-|     7 |              0 | (`nop`)   |    11 |    44 |
-|     8 |              1 | (`pop`)   |    11 |    44 |
+| `clk` | `shrink_stack` | `stack_pointer` | `first_underflow_element` |
+|------:|---------------:|----------------:|--------------------------:|
+|     0 |              0 |               4 |                         0 |
+|    22 |              1 |               4 |                         0 |
+|     1 |              0 |               5 |                         0 |
+|    21 |              1 |               5 |                         0 |
+|     2 |              0 |               6 |                         0 |
+|    20 |              1 |               6 |                         0 |
+|     3 |              0 |               7 |                         0 |
+|    11 |              1 |               7 |                         0 |
+|    12 |              0 |               7 |                         0 |
+|    19 |              1 |               7 |                         0 |
+|     4 |              0 |               8 |                        42 |
+|    10 |              1 |               8 |                        99 |
+|    14 |              0 |               8 |                        77 |
+|    18 |              1 |               8 |                        77 |
+|     5 |              0 |               9 |                        43 |
+|     9 |              1 |               9 |                        43 |
+|    16 |              0 |               9 |                        78 |
+|    17 |              1 |               9 |                        78 |
+|     6 |              0 |              10 |                        44 |
+|     8 |              1 |              10 |                        44 |
+
 
 ## Extension Columns
 
@@ -104,18 +111,17 @@ The Op Stack Table has 2 extension columns, `rppa` and `ClockJumpDifferenceLooku
 
 ## Padding
 
-A padding row is a direct copy of the Op Stack Table's row with the highest value for column `clk`, called template row, with the exception of the cycle count column `clk`.
-In a padding row, the value of column `clk` is 1 greater than the value of column `clk` in the template row.
-The padding row is inserted right below the template row.
-These steps are repeated until the desired padded height is reached.
-In total, above steps ensure that the Permutation Argument between the Op Stack Table and the [Processor Table](processor-table.md) holds up.
+The last row in the Op Stack Table is taken as the padding template row.
+Should the Op Stack Table be empty, the row (`clk`, `shrink_stack`, `stack_pointer`, `first_underflow_element`) = (0, 0, 16, 0) is used instead.
+In the template row, the `shrink_stack` indicator is set to 2, signifying padding.
+The template row is inserted below the last row until the desired padded height is reached.
 
 ## Memory-Consistency
 
 Memory-consistency follows from two more primitive properties:
 
 1. Contiguity of regions of constant memory pointer.
-  Since the memory pointer for the Op Stack table, `osp` can change by at most one per cycle, it is possible to enforce a full sorting using AIR constraints.
+  Since in Op Stack Table, the `stack_pointer` can change by at most one per cycle, it is possible to enforce a full sorting using AIR constraints.
 2. Correct inner-sorting within contiguous regions.
   Specifically, the rows within each contiguous region of constant memory pointer should be sorted for clock cycle.
   This property is established by the clock jump difference [Lookup Argument](lookup-argument.md).
@@ -129,18 +135,16 @@ Both types of challenges are X-field elements, _i.e._, elements of $\mathbb{F}_{
 
 ## Initial Constraints
 
-1. `clk` is 0
-1. `osv` is 0.
-1. `osp` is the number of available stack registers, _i.e._, 16.
-1. The running product for the permutation argument with the Processor Table `rppa` starts off having accumulated the first row with respect to challenges 🍋, 🍊, 🍉, and 🫒 and indeterminate 🪤.
+1. The `stack_pointer` is the number of available stack registers, _i.e._, 16.
+1. If the row is not a padding row, the running product for the permutation argument with the Processor Table `rppa` starts off having accumulated the first row with respect to challenges 🍋, 🍊, 🍉, and 🫒 and indeterminate 🪤.
+  Otherwise, it is the Permutation Argument's default initial, _i.e._, 1.
 1. The logarithmic derivative for the clock jump difference lookup `ClockJumpDifferenceLookupClientLogDerivative` is 0.
 
 ### Initial Constraints as Polynomials
 
-1. `clk`
-1. `osv`
-1. `osp - 16`
-1. `rppa - (🪤 - 🍋·clk - 🍊·ib1 - 🍉·osp - 🫒osv)`
+1. `stack_pointer - 16`
+1. `(shrink_stack - 2)·(rppa - (🪤 - 🍋·clk - 🍊·shrink_stack - 🍉·stack_pointer - 🫒·first_underflow_element))`<br />
+    `+ (shrink_stack - 0)·(shrink_stack - 1)·(rppa - 1)`
 1. `ClockJumpDifferenceLookupClientLogDerivative`
 
 ## Consistency Constraints
@@ -149,31 +153,41 @@ None.
 
 ## Transition Constraints
 
-1.
-  - the `osp` increases by 1, *or*
-  - the `osp` does not change AND the `osv` does not change, *or*
-  - the `osp` does not change AND the shrink stack indicator `shrink_stack` is 1.
-1. The running product for the permutation argument with the Processor Table `rppa` absorbs the next row with respect to challenges 🍋, 🍊, 🍉, and 🫒 and indeterminate 🪤.
-1. If the op stack pointer `osp` does not change, then the logarithmic derivative for the clock jump difference lookup `ClockJumpDifferenceLookupClientLogDerivative` accumulates a factor `(clk' - clk)` relative to indeterminate 🪞.
-  Otherwise, it remains the same.
+1.  - the `stack_pointer` increases by 1, *or*
+    - the `stack_pointer` does not change AND the `first_underflow_element` does not change, *or*
+    - the `stack_pointer` does not change AND the shrink stack indicator `shrink_stack` in the next row is 0.
+1. If thex next row is not a padding row, the running product for the permutation argument with the Processor Table `rppa` absorbs the next row with respect to challenges 🍋, 🍊, 🍉, and 🫒 and indeterminate 🪤.
+  Otherwise, the running product remains unchanged.
+1. If the current row is a padding row, then the next row is a padding row.
+1. If the next row is not a padding row and the op stack pointer `stack_pointer` does not change, then the logarithmic derivative for the clock jump difference lookup `ClockJumpDifferenceLookupClientLogDerivative` accumulates a factor `(clk' - clk)` relative to indeterminate 🪞.
+  Otherwise, it remains the unchanged.
 
 Written as Disjunctive Normal Form, the same constraints can be expressed as:
 
-1.
-  - the `osp` increases by 1 or the `osp` does not change
-  - the `osp` increases by 1 or the `osv` does not change or the shrink stack indicator `shrink_stack` is 1
-1. `rppa' = rppa·(🪤 - 🍋·clk' - 🍊·ib1' - 🍉·osp' - 🫒osv')`
-1. - the `osp` changes or the logarithmic derivative accumulates a summand, and
-   - the `osp` does not change or the logarithmic derivative does not change.
+1. The `stack_pointer` increases by 1 or the `stack_pointer` does not change.
+1. The `stack_pointer` increases by 1 or the `first_underflow_element` does not change or the shrink stack indicator `shrink_stack` in the next row is 0.
+1.  - The next row is a padding row or `rppa` has accumulated the next row, and
+    - the next row is not a padding row or `rppa` remains unchanged.
+1. The current row is not a padding row or the next row is a padding row.
+1.  - the `stack_pointer` changes or the next row is a padding row or the logarithmic derivative accumulates a summand,
+    - the `stack_pointer` remains unchanged or the logarithmic derivative remains unchanged, and
+    - the next row is not a padding row or the logarithmic derivative remains unchanged.
 
 ### Transition Constraints as Polynomials
 
-1. `(osp' - (osp + 1))·(osp' - osp)`
-1. `(osp' - (osp + 1))·(osv' - osv)·(1 - shrink_stack)`
-1. `rppa' - rppa·(🪤 - 🍋·clk' - 🍊·ib1' - 🍉·osp' - 🫒osv')`
-1. `(osp' - (osp + 1))·((ClockJumpDifferenceLookupClientLogDerivative' - ClockJumpDifferenceLookupClientLogDerivative) · (🪞 - clk' + clk) - 1)`<br />
-   `+ (osp' - osp)·(ClockJumpDifferenceLookupClientLogDerivative' - ClockJumpDifferenceLookupClientLogDerivative)`
+1. `(stack_pointer' - stack_pointer - 1)·(stack_pointer' - stack_pointer)`
+1. `(stack_pointer' - stack_pointer - 1)·(first_underflow_element' - first_underflow_element)·(shrink_stack' - 0)`
+1. `(shrink_stack' - 2)·(rppa' - rppa·(🪤 - 🍋·clk' - 🍊·shrink_stack' - 🍉·stack_pointer' - 🫒first_underflow_element'))`<br />
+    `+ (shrink_stack' - 0)·(shrink_stack' - 1)·(rppa' - rppa)`
+1. `(shrink_stack - 0)·(shrink_stack - 1)·(shrink_stack' - 2)`
+1. `(stack_pointer' - stack_pointer - 1)·(shrink_stack' - 2)·((ClockJumpDifferenceLookupClientLogDerivative' - ClockJumpDifferenceLookupClientLogDerivative) · (🪞 - clk' + clk) - 1)`<br />
+   `+ (stack_pointer' - stack_pointer)·(ClockJumpDifferenceLookupClientLogDerivative' - ClockJumpDifferenceLookupClientLogDerivative)`<br />
+   `+ (shrink_stack' - 0)·(shrink_stack' - 1)·(ClockJumpDifferenceLookupClientLogDerivative' - ClockJumpDifferenceLookupClientLogDerivative)`
 
 ## Terminal Constraints
 
 None.
+
+---
+
+[^abbrev]: frequently abbreviated as “Op Stack”
diff --git a/specification/src/processor-table.md b/specification/src/processor-table.md
index bd31b12c3..2cf7136c7 100644
--- a/specification/src/processor-table.md
+++ b/specification/src/processor-table.md
@@ -25,14 +25,27 @@ The Processor Table has the following extension columns, corresponding to [Evalu
 1. `RunningEvaluationStandardInput` for the Evaluation Argument with the input symbols.
 1. `RunningEvaluationStandardOutput` for the Evaluation Argument with the output symbols.
 1. `InstructionLookupClientLogDerivative` for the Lookup Argument with the [Program Table](program-table.md)
-1. `RunningProductOpStackTable` for the Permutation Argument with the [OpStack Table](operational-stack-table.md).
+1. `RunningProductOpStackTable` for the Permutation Argument with the [Op Stack Table](operational-stack-table.md).
 1. `RunningProductRamTable` for the Permutation Argument with the [RAM Table](random-access-memory-table.md).
 1. `RunningProductJumpStackTable` for the Permutation Argument with the [Jump Stack Table](jump-stack-table.md).
 1. `RunningEvaluationHashInput` for the Evaluation Argument with the [Hash Table](hash-table.md) for copying the input to the hash function from the processor to the hash coprocessor.
 1. `RunningEvaluationHashDigest` for the Evaluation Argument with the [Hash Table](hash-table.md) for copying the hash digest from the hash coprocessor to the processor.
 1. `RunningEvaluationSponge` for the Evaluation Argument with the [Hash Table](hash-table.md) for copying the 10 next to-be-absorbed elements from the processor to the hash coprocessor or the 10 next squeezed elements from the hash coprocessor to the processor, depending on the instruction.
 1. `U32LookupClientLogDerivative` for the Lookup Argument with the [U32 Table](u32-table.md).
-1. `ClockJumpDifferenceLookupServerLogDerivative` for the Lookup Argument of clock jump differences with the [OpStack Table](operational-stack-table.md), the [RAM Table](random-access-memory-table.md), and the [JumpStack Table](jump-stack-table.md).
+1. `ClockJumpDifferenceLookupServerLogDerivative` for the Lookup Argument of clock jump differences with the [Op Stack Table](operational-stack-table.md), the [RAM Table](random-access-memory-table.md), and the [Jump Stack Table](jump-stack-table.md).
+
+### Permutation Argument with the Op Stack Table
+
+The [Permutation Arguments](permutation-argument.md) with the [Op Stack Table](operational-stack-table.md) `RunningProductOpStackTable` establishes consistency of the op stack underflow memory.
+The number of factors incorporated into the running product at any given cycle depends on the executed instruction in this cycle:
+for every element pushed to or popped from the stack, there is one factor.
+Namely, if the op stack grows, every element spilling from `st15` into op stack underflow memory will be incorporated as one factor;
+and if the op stack shrinks, every element from op stack underflow memory being transferred into `st15` will be one factor.
+
+One key insight for this Permutation Argument is that the processor will always have access to the elements that are to be read from or written to underflow memory:
+if the instruction grows the op stack, then the elements in question currently reside in the directly accessible, top part of the stack;
+if the instruction shrinks the op stack, then the elements in question will be in the top part of the stack in the next cycle.
+In either case, the [Transition Constraint](arithmetization.md#arithmetic-intermediate-representation) for the Permutation Argument can incorporate the explicitly listed elements as well as the corresponding trivial-to-compute `op_stack_pointer`.
 
 ## Padding
 
@@ -86,13 +99,12 @@ However, in order to verify the correctness of `RunningEvaluationHashDigest`, th
 1. The operational stack element `st10` is 0.
 1. The [Evaluation Argument](evaluation-argument.md) of operational stack elements `st11` through `st15` with respect to indeterminate 🥬 equals the public part of program digest challenge, 🫑.
 See [program attestation](program-attestation.md) for more details.
-1. The operational stack pointer `osp` is 16.
-1. The operational stack value `osv` is 0.
+1. The `op_stack_pointer` is 16.
 1. The RAM pointer `ramp` is 0.
 1. `RunningEvaluationStandardInput` is 1.
 1. `RunningEvaluationStandardOutput` is 1.
 1. `InstructionLookupClientLogDerivative` has absorbed the first row with respect to challenges 🥝, 🥥, and 🫐 and indeterminate 🪥.
-1. `RunningProductOpStackTable` has absorbed the first row with respect to challenges 🍋, 🍊, 🍉, and 🫒 and indeterminate 🪤.
+1. `RunningProductOpStackTable` is 1.
 1. `RunningProductRamTable` has absorbed the first row with respect to challenges 🍍, 🍈, 🍎, and 🌽 and indeterminate 🛋.
 1. `RunningProductJumpStackTable` has absorbed the first row with respect to challenges 🍇, 🍅, 🍌, 🍏, and 🍐 and indeterminate 🧴.
 1. `RunningEvaluationHashInput` has absorbed the first row with respect to challenges 🧄₀ through 🧄₉ and indeterminate 🚪 if the current instruction is `hash`. Otherwise, it is 1.
@@ -121,13 +133,12 @@ See [program attestation](program-attestation.md) for more details.
 1. `st9`
 1. `st10`
 1. `🥬^5 + st11·🥬^4 + st12·🥬^3 + st13·🥬^2 + st14·🥬 + st15 - 🫑`
-1. `osp`
-1. `osv`
+1. `op_stack_pointer - 16`
 1. `ramp`
 1. `RunningEvaluationStandardInput - 1`
 1. `RunningEvaluationStandardOutput - 1`
 1. `InstructionLookupClientLogDerivative · (🪥 - 🥝·ip - 🥥·ci - 🫐·nia) - 1`
-1. `RunningProductOpStackTable - (🪤 - 🍋·clk - 🍊·ib1 - 🍉·osp - 🫒·osv)`
+1. `RunningProductOpStackTable - 1`
 1. `RunningProductRamTable - (🛋 - 🍍·clk - 🍈·ramp - 🍎·ramv - 🌽·previous_instruction)`
 1. `RunningProductJumpStackTable - (🧴 - 🍇·clk - 🍅·ci - 🍌·jsp - 🍏·jso - 🍐·jsd)`
 1. `(ci - opcode(hash))·(RunningEvaluationHashInput - 1)`<br />
@@ -176,9 +187,8 @@ The following additional constraints also apply to every pair of rows.
 1. The running evaluation for standard input absorbs `st0` of the next row with respect to 🛏 if the current instruction is `read_io`, and remains unchanged otherwise.
 1. The running evaluation for standard output absorbs `st0` of the next row with respect to 🧯 if the current instruction in the next row is `write_io`, and remains unchanged otherwise.
 1. If the next row is not a padding row, the logarithmic derivative for the Program Table absorbs the next row with respect to challenges 🥝, 🥥, and 🫐 and indeterminate 🪥. Otherwise, it remains unchanged.
-1. The running product for the OpStack Table absorbs the next row with respect to challenges 🍋, 🍊, 🍉, and 🫒 and indeterminate 🪤.
 1. The running product for the RAM Table absorbs the next row with respect to challenges 🍍, 🍈, 🍎, and 🌽 and indeterminate 🛋.
-1. The running product for the JumpStack Table absorbs the next row with respect to challenges 🍇, 🍅, 🍌, 🍏, and 🍐 and indeterminate 🧴.
+1. The running product for the Jump Stack Table absorbs the next row with respect to challenges 🍇, 🍅, 🍌, 🍏, and 🍐 and indeterminate 🧴.
 1. If the current instruction in the next row is `hash`, the running evaluation “Hash Input” absorbs the next row with respect to challenges 🧄₀ through 🧄₉ and indeterminate 🚪. Otherwise, it remains unchanged.
 1. If the current instruction is `hash`, the running evaluation “Hash Digest” absorbs the next row with respect to challenges 🧄₀ through 🧄₄ and indeterminate 🪟. Otherwise, it remains unchanged.
 1. If the current instruction is `sponge_init`, then the running evaluation “Sponge” absorbs the current instruction and the Sponge's default initial state with respect to challenges 🧅 and 🧄₀ through 🧄₉ and indeterminate 🧽.
@@ -205,7 +215,6 @@ Otherwise, the running evaluation remains unchanged.
     `+ write_io_deselector'·(RunningEvaluationStandardOutput' - 🧯·RunningEvaluationStandardOutput - st0')`
 1. `(1 - IsPadding') · ((InstructionLookupClientLogDerivative' - InstructionLookupClientLogDerivative) · (🛁 - 🥝·ip' - 🥥·ci' - 🫐·nia') - 1)`<br />
     `+ IsPadding'·(RunningProductInstructionTable' - RunningProductInstructionTable)`
-1. `RunningProductOpStackTable' - RunningProductOpStackTable·(🪤 - 🍋·clk' - 🍊·ib1' - 🍉·osp' - 🫒·osv')`
 1. `RunningProductRamTable' - RunningProductRamTable·(🛋 - 🍍·clk' - 🍈·ramp' - 🍎·ramv' - 🌽·previous_instruction')`
 1. `RunningProductJumpStackTable' - RunningProductJumpStackTable·(🧴 - 🍇·clk' - 🍅·ci' - 🍌·jsp' - 🍏·jso' - 🍐·jsd')`
 1. `(ci' - opcode(hash))·(RunningEvaluationHashInput' - RunningEvaluationHashInput)`<br />
diff --git a/specification/src/registers.md b/specification/src/registers.md
index 3608cb58b..00e5fd6e3 100644
--- a/specification/src/registers.md
+++ b/specification/src/registers.md
@@ -17,8 +17,7 @@ the remaining registers exist only to enable an efficient arithmetization and ar
 | `jso`                  | jump stack origin                         | contains the value of the instruction pointer of the last `call`                                                                                                                                                          |
 | `jsd`                  | jump stack destination                    | contains the argument of the last `call`                                                                                                                                                                                  |
 | `st0` through `st15`   | operational stack registers               | contain explicit operational stack values                                                                                                                                                                                 |
-| *`osp`                 | operational stack pointer                 | contains the OpStack address of the top of the operational stack                                                                                                                                                          |
-| *`osv`                 | operational stack value                   | contains the (stack) memory value at the given address                                                                                                                                                                    |
+| *`op_stack_pointer`    | operational stack pointer                 | the current size of the operational stack                                                                                                                                                                                 |
 | *`hv0` through `hv6`   | helper variable registers                 | helper variables for some arithmetic operations                                                                                                                                                                           |
 | *`ramp`                | RAM pointer                               | contains an address pointing into the RAM                                                                                                                                                                                 |
 | *`ramv`                | RAM value                                 | contains the value of the RAM element at the address currently held in `ramp`                                                                                                                                             |
@@ -33,18 +32,21 @@ For reasons of arithmetization, `ci` is decomposed, giving rise to the *instruct
 
 ## Stack
 
-The stack is represented by 16 registers called *stack registers* (`st0` – `st15`) plus the OpStack Underflow Memory.
-The top 16 elements of the OpStack are directly accessible, the remainder of the OpStack, i.e, the part held in OpStack Underflow Memory, is not.
-In order to access elements of the OpStack held in OpStack Underflow Memory, the stack has to shrink by discarding elements from the top – potentially after writing them to RAM – thus moving lower elements into the stack registers.
+The stack is represented by 16 registers called *stack registers* (`st0` – `st15`) plus the op stack underflow memory.
+The top 16 elements of the op stack are directly accessible, the remainder of the op stack, i.e, the part held in op stack underflow memory, is not.
+In order to access elements of the op stack held in op stack underflow memory, the stack has to shrink by discarding elements from the top – potentially after writing them to RAM – thus moving lower elements into the stack registers.
 
 The stack grows upwards, in line with the metaphor that justifies the name "stack".
 
 For reasons of arithmetization, the stack always contains a minimum of 16 elements.
-All these elements are initially 0.
 Trying to run an instruction which would result in a stack of smaller total length than 16 crashes the VM.
 
-The registers `osp` and `osv` are not directly accessible by the program running in TritonVM.
-They exist only to allow efficient arithmetization.
+Stack elements `st0` through `st10` are initially 0.
+Stack elements `st11` through `st15`, _i.e._, the very bottom of the stack, are initialized with the hash digest of the program that is being executed.
+See [the mechanics of program attestation](program-attestation.md#mechanics) for further explanations on stack initialization.
+
+The register `op_stack_pointer` is not directly accessible by the program running in TritonVM.
+It exists only to allow efficient arithmetization.
 
 ## RAM
 
diff --git a/triton-vm/src/aet.rs b/triton-vm/src/aet.rs
index c7453abf0..1682d36f6 100644
--- a/triton-vm/src/aet.rs
+++ b/triton-vm/src/aet.rs
@@ -4,6 +4,7 @@ use std::collections::HashMap;
 use std::ops::AddAssign;
 
 use anyhow::anyhow;
+use anyhow::bail;
 use anyhow::Result;
 use itertools::Itertools;
 use ndarray::s;
@@ -18,16 +19,17 @@ use twenty_first::shared_math::tip5;
 use twenty_first::shared_math::tip5::Tip5;
 use twenty_first::util_types::algebraic_hasher::SpongeHasher;
 
+use crate::error::InstructionError::InstructionPointerOverflow;
 use crate::instruction::Instruction;
 use crate::program::Program;
 use crate::stark::StarkHasher;
-use crate::table::hash_table;
 use crate::table::hash_table::HashTable;
 use crate::table::hash_table::PermutationTrace;
-use crate::table::processor_table;
+use crate::table::op_stack_table::OpStackTableEntry;
 use crate::table::table_column::HashBaseTableColumn::CI;
 use crate::table::table_column::MasterBaseTableColumn;
 use crate::table::u32_table::U32TableEntry;
+use crate::table::*;
 use crate::vm::CoProcessorCall;
 use crate::vm::CoProcessorCall::*;
 use crate::vm::VMState;
@@ -51,6 +53,8 @@ pub struct AlgebraicExecutionTrace {
     /// Records the state of the processor after each instruction.
     pub processor_trace: Array2<BFieldElement>,
 
+    pub op_stack_underflow_trace: Array2<BFieldElement>,
+
     /// The trace of hashing the program whose execution generated this `AlgebraicExecutionTrace`.
     /// The resulting digest
     /// 1. ties a [`Proof`](crate::proof::Proof) to the program it was produced from, and
@@ -85,6 +89,7 @@ impl AlgebraicExecutionTrace {
             program,
             instruction_multiplicities: vec![0_u32; program_len],
             processor_trace: Array2::default([0, processor_table::BASE_WIDTH]),
+            op_stack_underflow_trace: Array2::default([0, op_stack_table::BASE_WIDTH]),
             program_hash_trace: Array2::default([0, hash_table::BASE_WIDTH]),
             hash_trace: Array2::default([0, hash_table::BASE_WIDTH]),
             sponge_trace: Array2::default([0, hash_table::BASE_WIDTH]),
@@ -175,6 +180,10 @@ impl AlgebraicExecutionTrace {
         self.processor_trace.nrows()
     }
 
+    pub fn op_stack_table_length(&self) -> usize {
+        self.op_stack_underflow_trace.nrows()
+    }
+
     pub fn hash_table_length(&self) -> usize {
         self.sponge_trace.nrows() + self.hash_trace.nrows() + self.program_hash_trace.nrows()
     }
@@ -197,6 +206,19 @@ impl AlgebraicExecutionTrace {
     }
 
     pub fn record_state(&mut self, state: &VMState) -> Result<()> {
+        self.record_instruction_lookup(state.instruction_pointer)?;
+        self.append_state_to_processor_trace(state)
+    }
+
+    fn record_instruction_lookup(&mut self, instruction_pointer: usize) -> Result<()> {
+        if instruction_pointer >= self.instruction_multiplicities.len() {
+            bail!(InstructionPointerOverflow(instruction_pointer));
+        }
+        self.instruction_multiplicities[instruction_pointer] += 1;
+        Ok(())
+    }
+
+    fn append_state_to_processor_trace(&mut self, state: &VMState) -> Result<()> {
         self.processor_trace
             .push_row(state.to_processor_row().view())
             .map_err(|e| anyhow!(e))
@@ -204,18 +226,11 @@ impl AlgebraicExecutionTrace {
 
     pub fn record_co_processor_call(&mut self, co_processor_call: CoProcessorCall) {
         match co_processor_call {
-            Tip5Trace(Instruction::Hash, tip5_trace) => self.append_hash_trace(*tip5_trace),
+            Tip5Trace(Instruction::Hash, trace) => self.append_hash_trace(*trace),
             SpongeStateReset => self.append_initial_sponge_state(),
-            Tip5Trace(instruction, tip5_trace) => {
-                self.append_sponge_trace(instruction, *tip5_trace)
-            }
-            SingleU32TableEntry(u32_entry) => {
-                self.record_u32_table_entry(u32_entry);
-            }
-            DoubleU32TableEntry([u32_entry_0, u32_entry_1]) => {
-                self.record_u32_table_entry(u32_entry_0);
-                self.record_u32_table_entry(u32_entry_1);
-            }
+            Tip5Trace(instruction, trace) => self.append_sponge_trace(instruction, *trace),
+            U32Call(u32_entry) => self.record_u32_table_entry(u32_entry),
+            OpStackCall(op_stack_entry) => self.record_op_stack_entry(op_stack_entry),
         }
     }
 
@@ -299,6 +314,13 @@ impl AlgebraicExecutionTrace {
     fn record_u32_table_entry(&mut self, u32_entry: U32TableEntry) {
         self.u32_entries.entry(u32_entry).or_insert(0).add_assign(1)
     }
+
+    fn record_op_stack_entry(&mut self, op_stack_entry: OpStackTableEntry) {
+        let op_stack_table_row = op_stack_entry.to_base_table_row();
+        self.op_stack_underflow_trace
+            .push_row(op_stack_table_row.view())
+            .unwrap();
+    }
 }
 
 #[cfg(test)]
diff --git a/triton-vm/src/instruction.rs b/triton-vm/src/instruction.rs
index dac20ef6f..30f1af48f 100644
--- a/triton-vm/src/instruction.rs
+++ b/triton-vm/src/instruction.rs
@@ -197,7 +197,7 @@ impl<Dest: PartialEq + Default> AnInstruction<Dest> {
         }
     }
 
-    const fn name(&self) -> &'static str {
+    pub(crate) const fn name(&self) -> &'static str {
         match self {
             Pop => "pop",
             Push(_) => "push",
diff --git a/triton-vm/src/op_stack.rs b/triton-vm/src/op_stack.rs
index 8ef04c31c..b8f66e9da 100644
--- a/triton-vm/src/op_stack.rs
+++ b/triton-vm/src/op_stack.rs
@@ -1,11 +1,13 @@
 use std::fmt::Display;
 use std::fmt::Formatter;
 use std::fmt::Result as FmtResult;
-use std::result;
 
 use anyhow::anyhow;
+use anyhow::bail;
 use anyhow::Result;
+use arbitrary::Arbitrary;
 use get_size::GetSize;
+use itertools::Itertools;
 use num_traits::Zero;
 use serde_derive::Deserialize;
 use serde_derive::Serialize;
@@ -14,6 +16,7 @@ use twenty_first::shared_math::b_field_element::BFieldElement;
 use twenty_first::shared_math::digest::Digest;
 use twenty_first::shared_math::tip5::DIGEST_LENGTH;
 use twenty_first::shared_math::x_field_element::XFieldElement;
+use twenty_first::shared_math::x_field_element::EXTENSION_DEGREE;
 
 use crate::error::InstructionError::*;
 use crate::op_stack::OpStackElement::*;
@@ -46,61 +49,71 @@ impl OpStack {
         Self { stack }
     }
 
-    /// Push an element onto the op-stack.
-    pub(crate) fn push(&mut self, element: BFieldElement) {
+    pub(crate) fn push(&mut self, element: BFieldElement) -> UnderflowIO {
         self.stack.push(element);
+        UnderflowIO::Write(self.first_underflow_element())
     }
 
-    /// Push an extension field element onto the op-stack.
-    pub(crate) fn push_extension_field_element(&mut self, element: XFieldElement) {
+    pub(crate) fn push_extension_field_element(
+        &mut self,
+        element: XFieldElement,
+    ) -> [UnderflowIO; EXTENSION_DEGREE] {
+        let mut underflow_io = vec![];
         for coefficient in element.coefficients.into_iter().rev() {
-            self.push(coefficient);
+            let underflow_write = self.push(coefficient);
+            underflow_io.push(underflow_write);
         }
+        underflow_io.try_into().unwrap()
     }
 
-    /// Pop an element from the op-stack.
-    pub(crate) fn pop(&mut self) -> Result<BFieldElement> {
-        self.stack.pop().ok_or_else(|| anyhow!(OpStackTooShallow))
+    pub(crate) fn pop(&mut self) -> Result<(BFieldElement, UnderflowIO)> {
+        let underflow_io = UnderflowIO::Read(self.first_underflow_element());
+        let element = self.stack.pop().ok_or_else(|| anyhow!(OpStackTooShallow))?;
+        Ok((element, underflow_io))
     }
 
-    /// Pop an extension field element from the op-stack.
-    pub(crate) fn pop_extension_field_element(&mut self) -> Result<XFieldElement> {
-        let coefficients = self.pop_multiple()?;
+    pub(crate) fn pop_extension_field_element(
+        &mut self,
+    ) -> Result<(XFieldElement, [UnderflowIO; EXTENSION_DEGREE])> {
+        let (coefficients, underflow_io) = self.pop_multiple()?;
         let element = XFieldElement::new(coefficients);
-        Ok(element)
+        Ok((element, underflow_io))
     }
 
-    /// Pop a u32 from the op-stack.
-    pub(crate) fn pop_u32(&mut self) -> Result<u32> {
-        let element = self.pop()?;
-        element
+    pub(crate) fn pop_u32(&mut self) -> Result<(u32, UnderflowIO)> {
+        let (element, underflow_io) = self.pop()?;
+        let element = element
             .try_into()
-            .map_err(|_| anyhow!(FailedU32Conversion(element)))
+            .map_err(|_| anyhow!(FailedU32Conversion(element)))?;
+        Ok((element, underflow_io))
     }
 
-    /// Pop multiple elements from the op-stack.
-    pub(crate) fn pop_multiple<const N: usize>(&mut self) -> Result<[BFieldElement; N]> {
-        let mut popped_elements = [BFieldElement::zero(); N];
-        for element in popped_elements.iter_mut() {
-            *element = self.pop()?;
+    pub(crate) fn pop_multiple<const N: usize>(
+        &mut self,
+    ) -> Result<([BFieldElement; N], [UnderflowIO; N])> {
+        let mut elements = vec![];
+        let mut underflow_io = vec![];
+        for _ in 0..N {
+            let (element, underflow_read) = self.pop()?;
+            elements.push(element);
+            underflow_io.push(underflow_read);
         }
-        Ok(popped_elements)
+        let elements = elements.try_into().unwrap();
+        let underflow_io = underflow_io.try_into().unwrap();
+        Ok((elements, underflow_io))
     }
 
-    /// Fetches the indicated stack element without modifying the stack.
     pub(crate) fn peek_at(&self, stack_element: OpStackElement) -> BFieldElement {
         let stack_element_index = usize::from(stack_element);
         let top_of_stack_index = self.stack.len() - 1;
         self.stack[top_of_stack_index - stack_element_index]
     }
 
-    /// Fetches the top-most extension field element without modifying the stack.
     pub(crate) fn peek_at_top_extension_field_element(&self) -> XFieldElement {
         let coefficients = [self.peek_at(ST0), self.peek_at(ST1), self.peek_at(ST2)];
         XFieldElement::new(coefficients)
     }
 
-    /// Swaps the top of the stack with the indicated stack element.
     pub(crate) fn swap_top_with(&mut self, stack_element: OpStackElement) {
         let stack_element_index = usize::from(stack_element);
         let top_of_stack_index = self.stack.len() - 1;
@@ -108,27 +121,108 @@ impl OpStack {
             .swap(top_of_stack_index, top_of_stack_index - stack_element_index);
     }
 
-    /// `true` if and only if the op-stack contains fewer elements than the number of
-    /// op-stack registers, _i.e._, [`OpStackElement::COUNT`].
     pub(crate) fn is_too_shallow(&self) -> bool {
         self.stack.len() < OpStackElement::COUNT
     }
 
-    /// The address of the next free address of the op-stack.
-    /// Equivalent to the current length of the op-stack.
-    pub(crate) fn op_stack_pointer(&self) -> BFieldElement {
+    /// The address of the next free address of the op-stack. Equivalent to the current length of
+    /// the op-stack.
+    pub(crate) fn pointer(&self) -> BFieldElement {
         (self.stack.len() as u64).into()
     }
 
     /// The first element of the op-stack underflow memory, or 0 if the op-stack underflow memory
     /// is empty.
-    pub(crate) fn op_stack_value(&self) -> BFieldElement {
-        let top_of_stack_index = self.stack.len() - 1;
-        if top_of_stack_index < OpStackElement::COUNT {
-            return BFieldElement::zero();
+    pub(crate) fn first_underflow_element(&self) -> BFieldElement {
+        let default = BFieldElement::zero();
+        let Some(top_of_stack_index) = self.stack.len().checked_sub(1) else {
+            return default;
+        };
+        let Some(underflow_start) = top_of_stack_index.checked_sub(OpStackElement::COUNT) else {
+            return default;
+        };
+        self.stack.get(underflow_start).copied().unwrap_or(default)
+    }
+}
+
+/// Indicates changes to the op-stack underflow memory.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, GetSize, Serialize, Deserialize, Arbitrary)]
+#[must_use = "The change to underflow memory should be handled."]
+pub enum UnderflowIO {
+    Read(BFieldElement),
+    Write(BFieldElement),
+}
+
+impl UnderflowIO {
+    /// Remove spurious read/write sequences arising from temporary stack changes.
+    ///
+    /// For example, the sequence `[Read(5), Write(5), Read(7)]` can be replaced with `[Read(7)]`.
+    /// Similarly, the sequence `[Write(5), Write(3), Read(3), Read(5), Write(7)]` can be replaced
+    /// with `[Write(7)]`.
+    pub fn canonicalize_sequence(sequence: &mut Vec<Self>) {
+        while let Some(index) = Self::index_of_dual_pair(sequence) {
+            let _ = sequence.remove(index);
+            let _ = sequence.remove(index);
+        }
+    }
+
+    fn index_of_dual_pair(sequence: &[Self]) -> Option<usize> {
+        sequence
+            .iter()
+            .tuple_windows()
+            .find_position(|(&left, &right)| left.is_dual_to(right))
+            .map(|(index, _)| index)
+    }
+
+    fn is_dual_to(&self, other: Self) -> bool {
+        match (self, other) {
+            (&Self::Read(read), Self::Write(write)) => read == write,
+            (&Self::Write(write), Self::Read(read)) => read == write,
+            _ => false,
+        }
+    }
+
+    /// Whether the sequence of underflow IOs consists of either only reads or only writes.
+    pub fn is_uniform_sequence(sequence: &[Self]) -> bool {
+        sequence.iter().all(|io| io.is_same_type_as(sequence[0]))
+    }
+
+    fn is_same_type_as(&self, other: Self) -> bool {
+        matches!(
+            (self, other),
+            (&Self::Read(_), Self::Read(_)) | (&Self::Write(_), Self::Write(_))
+        )
+    }
+
+    /// Whether the sequence of underflow IOs consists of only writes.
+    pub fn is_writing_sequence(sequence: &[Self]) -> bool {
+        sequence.iter().all(|io| io.grows_stack())
+    }
+
+    /// Whether the sequence of underflow IOs consists of only reads.
+    pub fn is_reading_sequence(sequence: &[Self]) -> bool {
+        sequence.iter().all(|io| io.shrinks_stack())
+    }
+
+    pub fn shrinks_stack(&self) -> bool {
+        match self {
+            Self::Read(_) => true,
+            Self::Write(_) => false,
+        }
+    }
+
+    pub fn grows_stack(&self) -> bool {
+        match self {
+            Self::Read(_) => false,
+            Self::Write(_) => true,
+        }
+    }
+
+    pub fn payload(self) -> BFieldElement {
+        match self {
+            Self::Read(payload) => payload,
+            Self::Write(payload) => payload,
         }
-        let op_stack_value_index = top_of_stack_index - OpStackElement::COUNT;
-        self.stack[op_stack_value_index]
     }
 }
 
@@ -193,9 +287,9 @@ impl From<&OpStackElement> for u32 {
 }
 
 impl TryFrom<u32> for OpStackElement {
-    type Error = String;
+    type Error = anyhow::Error;
 
-    fn try_from(stack_index: u32) -> result::Result<Self, Self::Error> {
+    fn try_from(stack_index: u32) -> Result<Self> {
         match stack_index {
             0 => Ok(ST0),
             1 => Ok(ST1),
@@ -213,9 +307,7 @@ impl TryFrom<u32> for OpStackElement {
             13 => Ok(ST13),
             14 => Ok(ST14),
             15 => Ok(ST15),
-            _ => Err(format!(
-                "Index {stack_index} is out of range for `OpStackElement`."
-            )),
+            _ => bail!("Index {stack_index} is out of range for `OpStackElement`."),
         }
     }
 }
@@ -227,12 +319,10 @@ impl From<OpStackElement> for u64 {
 }
 
 impl TryFrom<u64> for OpStackElement {
-    type Error = String;
+    type Error = anyhow::Error;
 
-    fn try_from(stack_index: u64) -> result::Result<Self, Self::Error> {
-        let stack_index = u32::try_from(stack_index)
-            .map_err(|_| format!("Index {stack_index} is out of range for `OpStackElement`."))?;
-        stack_index.try_into()
+    fn try_from(stack_index: u64) -> Result<Self> {
+        u32::try_from(stack_index)?.try_into()
     }
 }
 
@@ -249,12 +339,10 @@ impl From<&OpStackElement> for usize {
 }
 
 impl TryFrom<usize> for OpStackElement {
-    type Error = String;
+    type Error = anyhow::Error;
 
-    fn try_from(stack_index: usize) -> result::Result<Self, Self::Error> {
-        let stack_index =
-            u32::try_from(stack_index).map_err(|_| "Cannot convert usize to u32.".to_string())?;
-        stack_index.try_into()
+    fn try_from(stack_index: usize) -> Result<Self> {
+        u32::try_from(stack_index)?.try_into()
     }
 }
 
@@ -272,10 +360,12 @@ impl From<&OpStackElement> for BFieldElement {
 
 #[cfg(test)]
 mod tests {
+    use proptest::collection::vec;
+    use proptest::prelude::*;
+    use proptest_arbitrary_interop::arb;
     use twenty_first::shared_math::b_field_element::BFieldElement;
 
-    use crate::op_stack::OpStack;
-    use crate::op_stack::OpStackElement;
+    use super::*;
 
     #[test]
     fn sanity() {
@@ -284,42 +374,36 @@ mod tests {
 
         // verify height
         assert_eq!(op_stack.stack.len(), 16);
-        assert_eq!(
-            op_stack.op_stack_pointer().value() as usize,
-            op_stack.stack.len()
-        );
+        assert_eq!(op_stack.pointer().value() as usize, op_stack.stack.len());
 
         // push elements 1 thru 17
         for i in 1..=17 {
-            op_stack.push(BFieldElement::new(i as u64));
+            let _ = op_stack.push(BFieldElement::new(i as u64));
         }
 
         // verify height
         assert_eq!(op_stack.stack.len(), 33);
-        assert_eq!(
-            op_stack.op_stack_pointer().value() as usize,
-            op_stack.stack.len()
-        );
+        assert_eq!(op_stack.pointer().value() as usize, op_stack.stack.len());
 
         // verify that all accessible items are different
         let mut container = vec![
-            op_stack.peek_at(OpStackElement::ST0),
-            op_stack.peek_at(OpStackElement::ST1),
-            op_stack.peek_at(OpStackElement::ST2),
-            op_stack.peek_at(OpStackElement::ST3),
-            op_stack.peek_at(OpStackElement::ST4),
-            op_stack.peek_at(OpStackElement::ST5),
-            op_stack.peek_at(OpStackElement::ST6),
-            op_stack.peek_at(OpStackElement::ST7),
-            op_stack.peek_at(OpStackElement::ST8),
-            op_stack.peek_at(OpStackElement::ST9),
-            op_stack.peek_at(OpStackElement::ST10),
-            op_stack.peek_at(OpStackElement::ST11),
-            op_stack.peek_at(OpStackElement::ST12),
-            op_stack.peek_at(OpStackElement::ST13),
-            op_stack.peek_at(OpStackElement::ST14),
-            op_stack.peek_at(OpStackElement::ST15),
-            op_stack.op_stack_value(),
+            op_stack.peek_at(ST0),
+            op_stack.peek_at(ST1),
+            op_stack.peek_at(ST2),
+            op_stack.peek_at(ST3),
+            op_stack.peek_at(ST4),
+            op_stack.peek_at(ST5),
+            op_stack.peek_at(ST6),
+            op_stack.peek_at(ST7),
+            op_stack.peek_at(ST8),
+            op_stack.peek_at(ST9),
+            op_stack.peek_at(ST10),
+            op_stack.peek_at(ST11),
+            op_stack.peek_at(ST12),
+            op_stack.peek_at(ST13),
+            op_stack.peek_at(ST14),
+            op_stack.peek_at(ST15),
+            op_stack.first_underflow_element(),
         ];
         let len_before = container.len();
         container.sort_by_key(|a| a.value());
@@ -329,29 +413,95 @@ mod tests {
 
         // pop 11 elements
         for _ in 0..11 {
-            op_stack.pop().expect("can't pop");
+            let _ = op_stack.pop().expect("can't pop");
         }
 
         // verify height
         assert_eq!(op_stack.stack.len(), 22);
-        assert_eq!(
-            op_stack.op_stack_pointer().value() as usize,
-            op_stack.stack.len()
-        );
+        assert_eq!(op_stack.pointer().value() as usize, op_stack.stack.len());
 
         // pop 2 XFieldElements
-        op_stack.pop_extension_field_element().expect("can't pop");
-        op_stack.pop_extension_field_element().expect("can't pop");
+        let _ = op_stack.pop_extension_field_element().expect("can't pop");
+        let _ = op_stack.pop_extension_field_element().expect("can't pop");
 
         // verify height
         assert_eq!(op_stack.stack.len(), 16);
-        assert_eq!(
-            op_stack.op_stack_pointer().value() as usize,
-            op_stack.stack.len()
-        );
+        assert_eq!(op_stack.pointer().value() as usize, op_stack.stack.len());
 
         // verify underflow
-        op_stack.pop().expect("can't pop");
+        let _ = op_stack.pop().expect("can't pop");
         assert!(op_stack.is_too_shallow());
     }
+
+    #[test]
+    fn trying_to_access_first_underflow_element_never_panics() {
+        let mut op_stack = OpStack::new(Default::default());
+        let way_too_long = 2 * op_stack.stack.len();
+        for _ in 0..way_too_long {
+            let _ = op_stack.pop();
+            let _ = op_stack.first_underflow_element();
+        }
+    }
+
+    #[test]
+    fn canonicalize_empty_underflow_io_sequence() {
+        let mut sequence = vec![];
+        UnderflowIO::canonicalize_sequence(&mut sequence);
+
+        let expected_sequence = Vec::<UnderflowIO>::new();
+        assert_eq!(expected_sequence, sequence);
+    }
+
+    #[test]
+    fn canonicalize_simple_underflow_io_sequence() {
+        let mut sequence = vec![
+            UnderflowIO::Read(5_u64.into()),
+            UnderflowIO::Write(5_u64.into()),
+            UnderflowIO::Read(7_u64.into()),
+        ];
+        UnderflowIO::canonicalize_sequence(&mut sequence);
+
+        let expected_sequence = vec![UnderflowIO::Read(7_u64.into())];
+        assert_eq!(expected_sequence, sequence);
+    }
+
+    #[test]
+    fn canonicalize_medium_complex_underflow_io_sequence() {
+        let mut sequence = vec![
+            UnderflowIO::Write(5_u64.into()),
+            UnderflowIO::Write(3_u64.into()),
+            UnderflowIO::Read(3_u64.into()),
+            UnderflowIO::Read(5_u64.into()),
+            UnderflowIO::Write(7_u64.into()),
+        ];
+        UnderflowIO::canonicalize_sequence(&mut sequence);
+
+        let expected_sequence = vec![UnderflowIO::Write(7_u64.into())];
+        assert_eq!(expected_sequence, sequence);
+    }
+
+    proptest! {
+        #[test]
+        fn underflow_io_either_shrinks_stack_or_grows_stack(underflow_io in arb::<UnderflowIO>()) {
+            let shrinks_stack = underflow_io.shrinks_stack();
+            let grows_stack = underflow_io.grows_stack();
+            assert!(shrinks_stack ^ grows_stack);
+        }
+    }
+
+    proptest! {
+        #[test]
+        fn non_empty_uniform_underflow_io_sequence_is_either_reading_or_writing(
+            sequence in vec(arb::<UnderflowIO>(), 1..OpStackElement::COUNT),
+        ) {
+            let is_reading_sequence = UnderflowIO::is_reading_sequence(&sequence);
+            let is_writing_sequence = UnderflowIO::is_writing_sequence(&sequence);
+            if UnderflowIO::is_uniform_sequence(&sequence) {
+                prop_assert!(is_reading_sequence ^ is_writing_sequence);
+            } else {
+                prop_assert!(!is_reading_sequence);
+                prop_assert!(!is_writing_sequence);
+            }
+        }
+    }
 }
diff --git a/triton-vm/src/program.rs b/triton-vm/src/program.rs
index 550d214d5..8461c9d49 100644
--- a/triton-vm/src/program.rs
+++ b/triton-vm/src/program.rs
@@ -21,7 +21,6 @@ use twenty_first::util_types::algebraic_hasher::AlgebraicHasher;
 
 use crate::aet::AlgebraicExecutionTrace;
 use crate::ensure_eq;
-use crate::error::InstructionError::InstructionPointerOverflow;
 use crate::instruction::Instruction;
 use crate::instruction::LabelledInstruction;
 use crate::parser::parse;
@@ -360,15 +359,9 @@ impl Program {
 
         while !state.halting {
             aet.record_state(&state)?;
-
-            match state.instruction_pointer < aet.instruction_multiplicities.len() {
-                true => aet.instruction_multiplicities[state.instruction_pointer] += 1,
-                false => bail!(InstructionPointerOverflow(state.instruction_pointer)),
-            }
-
-            let maybe_co_processor_call = state.step()?;
-            if let Some(co_processor_call) = maybe_co_processor_call {
-                aet.record_co_processor_call(co_processor_call);
+            let co_processor_calls = state.step()?;
+            for call in co_processor_calls {
+                aet.record_co_processor_call(call);
             }
         }
 
diff --git a/triton-vm/src/stark.rs b/triton-vm/src/stark.rs
index acdecec5c..b7d1f8b8f 100644
--- a/triton-vm/src/stark.rs
+++ b/triton-vm/src/stark.rs
@@ -1115,10 +1115,13 @@ pub(crate) mod tests {
     use rand::thread_rng;
     use rand::Rng;
     use rand_core::RngCore;
+    use strum::EnumCount;
     use twenty_first::shared_math::other::random_elements;
 
     use crate::example_programs::*;
     use crate::instruction::AnInstruction;
+    use crate::instruction::Instruction;
+    use crate::op_stack::OpStackElement;
     use crate::program::Program;
     use crate::shared_tests::*;
     use crate::table::cascade_table;
@@ -1155,6 +1158,7 @@ pub(crate) mod tests {
     use crate::table::table_column::LookupExtTableColumn::PublicEvaluationArgument;
     use crate::table::table_column::MasterBaseTableColumn;
     use crate::table::table_column::MasterExtTableColumn;
+    use crate::table::table_column::OpStackBaseTableColumn;
     use crate::table::table_column::ProcessorBaseTableColumn;
     use crate::table::table_column::ProcessorExtTableColumn::InputTableEvalArg;
     use crate::table::table_column::ProcessorExtTableColumn::OutputTableEvalArg;
@@ -1247,40 +1251,19 @@ pub(crate) mod tests {
 
             let prev_instruction =
                 row[ProcessorBaseTableColumn::PreviousInstruction.base_table_index()].value();
-            let curr_instruction = row[ProcessorBaseTableColumn::CI.base_table_index()].value();
-            let next_instruction_or_arg =
-                row[ProcessorBaseTableColumn::NIA.base_table_index()].value();
-
-            // sorry about this mess – this is just a test.
-            let pi = match AnInstruction::<BFieldElement>::try_from(prev_instruction) {
+            let pi = match Instruction::try_from(prev_instruction) {
                 Ok(AnInstruction::Halt) | Err(_) => "-".to_string(),
-                Ok(instr) => instr.to_string().split('0').collect_vec()[0].to_owned(),
-            };
-            let ci = AnInstruction::<BFieldElement>::try_from(curr_instruction).unwrap();
-            let nia = if ci.size() == 2 {
-                next_instruction_or_arg.to_string()
-            } else {
-                AnInstruction::<BFieldElement>::try_from(next_instruction_or_arg)
-                    .unwrap()
-                    .to_string()
-                    .split('0')
-                    .collect_vec()[0]
-                    .to_owned()
-            };
-            let ci_string = if ci.size() == 1 {
-                ci.to_string()
-            } else {
-                ci.to_string().split('0').collect_vec()[0].to_owned()
+                Ok(instr) => instr.name().to_string(),
             };
-            let interesting_cols = [clk, pi, ci_string, nia, st0, st1, st2, st3, ramp, ramv];
-            println!(
-                "{}",
-                interesting_cols
-                    .iter()
-                    .map(|ff| format!("{:>10}", format!("{ff}")))
-                    .collect_vec()
-                    .join(" | ")
-            );
+            let (ci, nia) = ci_and_nia_from_master_table_row(row);
+
+            let interesting_cols = [clk, pi, ci, nia, st0, st1, st2, st3, ramp, ramv];
+            let interesting_cols = interesting_cols
+                .iter()
+                .map(|ff| format!("{:>10}", format!("{ff}")))
+                .collect_vec()
+                .join(" | ");
+            println!("{interesting_cols}");
         }
         println!();
         println!("RAM Table:");
@@ -1295,22 +1278,128 @@ pub(crate) mod tests {
 
             let prev_instruction =
                 row[RamBaseTableColumn::PreviousInstruction.base_table_index()].value();
-            let pi = match AnInstruction::<BFieldElement>::try_from(prev_instruction) {
+            let pi = match Instruction::try_from(prev_instruction) {
                 Ok(AnInstruction::Halt) | Err(_) => "-".to_string(),
-                Ok(instr) => instr.to_string().split('0').collect_vec()[0].to_owned(),
+                Ok(instr) => instr.name().to_string(),
             };
-            let interersting_cols = [clk, pi, ramp, ramv, iord];
-            println!(
-                "{}",
-                interersting_cols
-                    .iter()
-                    .map(|ff| format!("{:>10}", format!("{ff}")))
-                    .collect_vec()
-                    .join(" | ")
-            );
+
+            let interesting_cols = [clk, pi, ramp, ramv, iord];
+            let interesting_cols = interesting_cols
+                .iter()
+                .map(|ff| format!("{:>10}", format!("{ff}")))
+                .collect_vec()
+                .join(" | ");
+            println!("{interesting_cols}");
+        }
+    }
+
+    #[test]
+    fn print_op_stack_table_example_for_specification() {
+        let num_interesting_rows = 30;
+        let fake_op_stack_size = 4;
+
+        let program = triton_program! {
+            push 42 push 43 push 44 push 45 push 46 push 47 push 48
+            nop pop pop pop pop
+            push 77 swap 3 push 78 swap 3 push 79
+            pop pop pop pop pop pop
+            halt
+        };
+        let (_, _, master_base_table) =
+            master_base_table_for_low_security_level(&program, [].into(), [].into());
+
+        println!();
+        println!("Processor Table:");
+        println!(
+            "| clk        | ci         | nia        | st0        | st1        \
+             | st2        | st3        | underflow  | pointer    |"
+        );
+        println!(
+            "|-----------:|:-----------|-----------:|-----------:|-----------:\
+             |-----------:|-----------:|:-----------|-----------:|"
+        );
+        for row in master_base_table
+            .table(ProcessorTable)
+            .rows()
+            .into_iter()
+            .take(num_interesting_rows)
+        {
+            let clk = row[ProcessorBaseTableColumn::CLK.base_table_index()].to_string();
+            let st0 = row[ProcessorBaseTableColumn::ST0.base_table_index()].to_string();
+            let st1 = row[ProcessorBaseTableColumn::ST1.base_table_index()].to_string();
+            let st2 = row[ProcessorBaseTableColumn::ST2.base_table_index()].to_string();
+            let st3 = row[ProcessorBaseTableColumn::ST3.base_table_index()].to_string();
+            let st4 = row[ProcessorBaseTableColumn::ST4.base_table_index()].to_string();
+            let st5 = row[ProcessorBaseTableColumn::ST5.base_table_index()].to_string();
+            let st6 = row[ProcessorBaseTableColumn::ST6.base_table_index()].to_string();
+            let st7 = row[ProcessorBaseTableColumn::ST7.base_table_index()].to_string();
+            let st8 = row[ProcessorBaseTableColumn::ST8.base_table_index()].to_string();
+            let st9 = row[ProcessorBaseTableColumn::ST9.base_table_index()].to_string();
+
+            let osp = row[ProcessorBaseTableColumn::OpStackPointer.base_table_index()];
+            let osp =
+                (osp.value() + fake_op_stack_size).saturating_sub(OpStackElement::COUNT as u64);
+
+            let underflow_size = osp.saturating_sub(fake_op_stack_size);
+            let underflow_candidates = [st4, st5, st6, st7, st8, st9];
+            let underflow = underflow_candidates
+                .into_iter()
+                .take(underflow_size as usize);
+            let underflow = underflow.map(|ff| format!("{:>2}", format!("{ff}")));
+            let underflow = format!("[{}]", underflow.collect_vec().join(", "));
+
+            let osp = osp.to_string();
+            let (ci, nia) = ci_and_nia_from_master_table_row(row);
+
+            let interesting_cols = [clk, ci, nia, st0, st1, st2, st3, underflow, osp];
+            let interesting_cols = interesting_cols
+                .map(|ff| format!("{:>10}", format!("{ff}")))
+                .join(" | ");
+            println!("{interesting_cols}");
+        }
+
+        println!();
+        println!("Op Stack Table:");
+        println!("|        clk |        ib1 |    pointer |      value |");
+        println!("|-----------:|-----------:|-----------:|-----------:|");
+        for row in master_base_table
+            .table(TableId::OpStackTable)
+            .rows()
+            .into_iter()
+            .take(num_interesting_rows)
+        {
+            let clk = row[OpStackBaseTableColumn::CLK.base_table_index()].to_string();
+            let ib1 = row[OpStackBaseTableColumn::IB1ShrinkStack.base_table_index()].to_string();
+
+            let osp = row[OpStackBaseTableColumn::StackPointer.base_table_index()];
+            let osp =
+                (osp.value() + fake_op_stack_size).saturating_sub(OpStackElement::COUNT as u64);
+            let osp = osp.to_string();
+
+            let value =
+                row[OpStackBaseTableColumn::FirstUnderflowElement.base_table_index()].to_string();
+
+            let interesting_cols = [clk, ib1, osp, value];
+            let interesting_cols = interesting_cols
+                .map(|ff| format!("{:>10}", format!("{ff}")))
+                .join(" | ");
+            println!("{interesting_cols}");
         }
     }
 
+    fn ci_and_nia_from_master_table_row(row: ArrayView1<BFieldElement>) -> (String, String) {
+        let curr_instruction = row[ProcessorBaseTableColumn::CI.base_table_index()].value();
+        let next_instruction_or_arg = row[ProcessorBaseTableColumn::NIA.base_table_index()].value();
+
+        let curr_instruction = Instruction::try_from(curr_instruction).unwrap();
+        let nia = if curr_instruction.has_arg() {
+            next_instruction_or_arg.to_string()
+        } else {
+            "".to_string()
+        };
+        (curr_instruction.name().to_string(), nia)
+    }
+
     /// To be used with `-- --nocapture`. Has mainly informative purpose.
     #[test]
     fn print_all_constraint_degrees() {
diff --git a/triton-vm/src/table/challenges.rs b/triton-vm/src/table/challenges.rs
index 013eea5d9..774a1ac29 100644
--- a/triton-vm/src/table/challenges.rs
+++ b/triton-vm/src/table/challenges.rs
@@ -18,6 +18,7 @@
 //! table. Instead, the terminal of the Evaluation Argument is computed directly from the
 //! public input (respectively output) and the indeterminate.
 
+use arbitrary::Arbitrary;
 use std::fmt::Debug;
 use std::hash::Hash;
 use std::ops::Index;
@@ -99,8 +100,8 @@ pub enum ChallengeId {
 
     OpStackClkWeight,
     OpStackIb1Weight,
-    OpStackOspWeight,
-    OpStackOsvWeight,
+    OpStackPointerWeight,
+    OpStackFirstUnderflowElementWeight,
 
     RamClkWeight,
     RamRampWeight,
@@ -216,6 +217,7 @@ impl ChallengeId {
 /// known only at runtime. The challenges are indexed using enum [`ChallengeId`]. The `Challenges`
 /// struct is essentially a thin wrapper around an array of [`XFieldElement`]s, providing
 /// convenience methods.
+#[derive(Debug, Clone, Arbitrary)]
 pub struct Challenges {
     pub challenges: [XFieldElement; Self::count()],
 }
diff --git a/triton-vm/src/table/master_table.rs b/triton-vm/src/table/master_table.rs
index 54bdafea0..852bd8157 100644
--- a/triton-vm/src/table/master_table.rs
+++ b/triton-vm/src/table/master_table.rs
@@ -146,6 +146,8 @@ pub const EXT_DEGREE_LOWERING_TABLE_START: usize = EXT_U32_TABLE_END;
 pub const EXT_DEGREE_LOWERING_TABLE_END: usize =
     EXT_DEGREE_LOWERING_TABLE_START + degree_lowering_table::EXT_WIDTH;
 
+const NUM_TABLES_WITHOUT_DEGREE_LOWERING: usize = TableId::COUNT - 1;
+
 /// A `TableId` uniquely determines one of Triton VM's tables.
 #[derive(Debug, Copy, Clone, Display, EnumCount, EnumIter, PartialEq, Eq, Hash)]
 pub enum TableId {
@@ -327,6 +329,7 @@ pub struct MasterBaseTable {
 
     program_table_len: usize,
     main_execution_len: usize,
+    op_stack_table_len: usize,
     hash_coprocessor_execution_len: usize,
     cascade_table_len: usize,
     u32_coprocesor_execution_len: usize,
@@ -587,6 +590,7 @@ impl MasterBaseTable {
             num_trace_randomizers,
             program_table_len: aet.program_table_length(),
             main_execution_len: aet.processor_table_length(),
+            op_stack_table_len: aet.op_stack_table_length(),
             hash_coprocessor_execution_len: aet.hash_table_length(),
             cascade_table_len: aet.cascade_table_length(),
             u32_coprocesor_execution_len: aet.u32_table_length(),
@@ -705,7 +709,7 @@ impl MasterBaseTable {
         DegreeLoweringTable::fill_derived_base_columns(self.trace_table_mut());
     }
 
-    fn all_pad_functions() -> [PadFunction; TableId::COUNT - 1] {
+    fn all_pad_functions() -> [PadFunction; NUM_TABLES_WITHOUT_DEGREE_LOWERING] {
         [
             ProgramTable::pad_trace,
             ProcessorTable::pad_trace,
@@ -719,13 +723,17 @@ impl MasterBaseTable {
         ]
     }
 
-    fn all_table_lengths(&self) -> [usize; TableId::COUNT - 1] {
+    fn all_table_lengths(&self) -> [usize; NUM_TABLES_WITHOUT_DEGREE_LOWERING] {
+        let processor_table_len = self.main_execution_len;
+        let ram_table_len = self.main_execution_len;
+        let jump_stack_table_len = self.main_execution_len;
+
         [
             self.program_table_len,
-            self.main_execution_len,
-            self.main_execution_len,
-            self.main_execution_len,
-            self.main_execution_len,
+            processor_table_len,
+            self.op_stack_table_len,
+            ram_table_len,
+            jump_stack_table_len,
             self.hash_coprocessor_execution_len,
             self.cascade_table_len,
             1 << 8,
@@ -830,7 +838,7 @@ impl MasterBaseTable {
         master_ext_table
     }
 
-    fn all_extend_functions() -> [ExtendFunction; TableId::COUNT - 1] {
+    fn all_extend_functions() -> [ExtendFunction; NUM_TABLES_WITHOUT_DEGREE_LOWERING] {
         [
             ProgramTable::extend,
             ProcessorTable::extend,
@@ -844,7 +852,9 @@ impl MasterBaseTable {
         ]
     }
 
-    fn base_tables_for_extending(&self) -> [ArrayView2<BFieldElement>; TableId::COUNT - 1] {
+    fn base_tables_for_extending(
+        &self,
+    ) -> [ArrayView2<BFieldElement>; NUM_TABLES_WITHOUT_DEGREE_LOWERING] {
         [
             self.table(TableId::ProgramTable),
             self.table(TableId::ProcessorTable),
diff --git a/triton-vm/src/table/op_stack_table.rs b/triton-vm/src/table/op_stack_table.rs
index 9f29b464b..9d3f1b5f3 100644
--- a/triton-vm/src/table/op_stack_table.rs
+++ b/triton-vm/src/table/op_stack_table.rs
@@ -1,5 +1,7 @@
 use std::cmp::Ordering;
 
+use arbitrary::Arbitrary;
+use itertools::Itertools;
 use ndarray::parallel::prelude::*;
 use ndarray::s;
 use ndarray::Array1;
@@ -7,6 +9,8 @@ use ndarray::ArrayView1;
 use ndarray::ArrayView2;
 use ndarray::ArrayViewMut2;
 use ndarray::Axis;
+use num_traits::One;
+use num_traits::Zero;
 use strum::EnumCount;
 use twenty_first::shared_math::b_field_element::BFieldElement;
 use twenty_first::shared_math::traits::Inverse;
@@ -14,6 +18,7 @@ use twenty_first::shared_math::x_field_element::XFieldElement;
 
 use crate::aet::AlgebraicExecutionTrace;
 use crate::op_stack::OpStackElement;
+use crate::op_stack::UnderflowIO;
 use crate::table::challenges::ChallengeId::*;
 use crate::table::challenges::Challenges;
 use crate::table::constraint_circuit::DualRowIndicator::*;
@@ -28,45 +33,126 @@ pub const BASE_WIDTH: usize = OpStackBaseTableColumn::COUNT;
 pub const EXT_WIDTH: usize = OpStackExtTableColumn::COUNT;
 pub const FULL_WIDTH: usize = BASE_WIDTH + EXT_WIDTH;
 
+/// The value indicating a padding row in the op stack table. Stored in the `ib1_shrink_stack`
+/// column.
+pub(crate) const PADDING_VALUE: BFieldElement = BFieldElement::new(2);
+
 #[derive(Debug, Clone)]
 pub struct OpStackTable {}
 
 #[derive(Debug, Clone)]
 pub struct ExtOpStackTable {}
 
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Arbitrary)]
+pub struct OpStackTableEntry {
+    pub clk: u32,
+    pub op_stack_pointer: BFieldElement,
+    pub underflow_io: UnderflowIO,
+}
+
+impl OpStackTableEntry {
+    pub fn new(clk: u32, op_stack_pointer: BFieldElement, underflow_io: UnderflowIO) -> Self {
+        Self {
+            clk,
+            op_stack_pointer,
+            underflow_io,
+        }
+    }
+
+    pub fn shrinks_stack(&self) -> bool {
+        self.underflow_io.shrinks_stack()
+    }
+
+    pub fn grows_stack(&self) -> bool {
+        self.underflow_io.grows_stack()
+    }
+
+    pub fn from_underflow_io_sequence(
+        clk: u32,
+        op_stack_pointer_after_sequence_execution: BFieldElement,
+        mut underflow_io_sequence: Vec<UnderflowIO>,
+    ) -> Vec<Self> {
+        UnderflowIO::canonicalize_sequence(&mut underflow_io_sequence);
+        assert!(UnderflowIO::is_uniform_sequence(&underflow_io_sequence));
+
+        let sequence_length: BFieldElement =
+            u32::try_from(underflow_io_sequence.len()).unwrap().into();
+        let mut op_stack_pointer = match UnderflowIO::is_writing_sequence(&underflow_io_sequence) {
+            true => op_stack_pointer_after_sequence_execution - sequence_length,
+            false => op_stack_pointer_after_sequence_execution + sequence_length,
+        };
+        let mut op_stack_table_entries = vec![];
+        for underflow_io in underflow_io_sequence {
+            if underflow_io.shrinks_stack() {
+                op_stack_pointer.decrement();
+            }
+            let op_stack_table_entry = Self::new(clk, op_stack_pointer, underflow_io);
+            op_stack_table_entries.push(op_stack_table_entry);
+            if underflow_io.grows_stack() {
+                op_stack_pointer.increment();
+            }
+        }
+        op_stack_table_entries
+    }
+
+    pub fn to_base_table_row(self) -> Array1<BFieldElement> {
+        let shrink_stack_indicator = match self.shrinks_stack() {
+            true => BFieldElement::one(),
+            false => BFieldElement::zero(),
+        };
+
+        let mut row = Array1::zeros(BASE_WIDTH);
+        row[CLK.base_table_index()] = self.clk.into();
+        row[IB1ShrinkStack.base_table_index()] = shrink_stack_indicator;
+        row[StackPointer.base_table_index()] = self.op_stack_pointer;
+        row[FirstUnderflowElement.base_table_index()] = self.underflow_io.payload();
+        row
+    }
+}
+
 impl ExtOpStackTable {
     pub fn initial_constraints(
         circuit_builder: &ConstraintCircuitBuilder<SingleRowIndicator>,
     ) -> Vec<ConstraintCircuitMonad<SingleRowIndicator>> {
-        let clk = circuit_builder.input(BaseRow(CLK.master_base_table_index()));
-        let ib1 = circuit_builder.input(BaseRow(IB1ShrinkStack.master_base_table_index()));
-        let osp = circuit_builder.input(BaseRow(OSP.master_base_table_index()));
-        let osv = circuit_builder.input(BaseRow(OSV.master_base_table_index()));
-        let rppa = circuit_builder.input(ExtRow(RunningProductPermArg.master_ext_table_index()));
-        let clock_jump_diff_log_derivative = circuit_builder.input(ExtRow(
-            ClockJumpDifferenceLookupClientLogDerivative.master_ext_table_index(),
-        ));
-
-        let clk_is_0 = clk;
-        let osv_is_0 = osv;
-        let osp_is_16 = osp - circuit_builder.b_constant(16_u32.into());
-
-        // The running product for the permutation argument `rppa` starts off having accumulated the
-        // first row. Note that `clk` and `osv` are constrained to be 0, and `osp` to be 16.
-        let compressed_row = circuit_builder.challenge(OpStackIb1Weight) * ib1
-            + circuit_builder.challenge(OpStackOspWeight)
-                * circuit_builder.b_constant(16_u32.into());
-        let processor_perm_indeterminate = circuit_builder.challenge(OpStackIndeterminate);
-        let rppa_initial = processor_perm_indeterminate - compressed_row;
-        let rppa_starts_correctly = rppa - rppa_initial;
-
-        let clock_jump_diff_log_derivative_is_initialized_correctly = clock_jump_diff_log_derivative
-            - circuit_builder.x_constant(LookupArg::default_initial());
+        let challenge = |c| circuit_builder.challenge(c);
+        let constant = |c| circuit_builder.b_constant(c);
+        let x_constant = |c| circuit_builder.x_constant(c);
+        let base_row = |column: OpStackBaseTableColumn| {
+            circuit_builder.input(BaseRow(column.master_base_table_index()))
+        };
+        let ext_row = |column: OpStackExtTableColumn| {
+            circuit_builder.input(ExtRow(column.master_ext_table_index()))
+        };
+
+        let initial_stack_length = u32::try_from(OpStackElement::COUNT).unwrap();
+        let initial_stack_length = constant(initial_stack_length.into());
+        let padding_indicator = constant(PADDING_VALUE);
+
+        let stack_pointer_is_16 = base_row(StackPointer) - initial_stack_length.clone();
+
+        let compressed_row = challenge(OpStackClkWeight) * base_row(CLK)
+            + challenge(OpStackIb1Weight) * base_row(IB1ShrinkStack)
+            + challenge(OpStackPointerWeight) * initial_stack_length
+            + challenge(OpStackFirstUnderflowElementWeight) * base_row(FirstUnderflowElement);
+        let rppa_initial = challenge(OpStackIndeterminate) - compressed_row;
+        let rppa_has_accumulated_first_row = ext_row(RunningProductPermArg) - rppa_initial;
+
+        let rppa_is_default_initial =
+            ext_row(RunningProductPermArg) - x_constant(PermArg::default_initial());
+
+        let first_row_is_padding_row = base_row(IB1ShrinkStack) - padding_indicator;
+        let first_row_is_not_padding_row =
+            base_row(IB1ShrinkStack) * (base_row(IB1ShrinkStack) - constant(1_u64.into()));
+
+        let rppa_starts_correctly = rppa_has_accumulated_first_row * first_row_is_padding_row
+            + rppa_is_default_initial * first_row_is_not_padding_row;
+
+        let lookup_argument_initial = x_constant(LookupArg::default_initial());
+        let clock_jump_diff_log_derivative_is_initialized_correctly =
+            ext_row(ClockJumpDifferenceLookupClientLogDerivative) - lookup_argument_initial;
 
         vec![
-            clk_is_0,
-            osv_is_0,
-            osp_is_16,
+            stack_pointer_is_16,
             rppa_starts_correctly,
             clock_jump_diff_log_derivative_is_initialized_correctly,
         ]
@@ -82,73 +168,97 @@ impl ExtOpStackTable {
     pub fn transition_constraints(
         circuit_builder: &ConstraintCircuitBuilder<DualRowIndicator>,
     ) -> Vec<ConstraintCircuitMonad<DualRowIndicator>> {
-        let one = circuit_builder.b_constant(1u32.into());
-
-        let clk = circuit_builder.input(CurrentBaseRow(CLK.master_base_table_index()));
-        let ib1_shrink_stack =
-            circuit_builder.input(CurrentBaseRow(IB1ShrinkStack.master_base_table_index()));
-        let osp = circuit_builder.input(CurrentBaseRow(OSP.master_base_table_index()));
-        let osv = circuit_builder.input(CurrentBaseRow(OSV.master_base_table_index()));
-        let rppa = circuit_builder.input(CurrentExtRow(
-            RunningProductPermArg.master_ext_table_index(),
-        ));
-        let clock_jump_diff_log_derivative = circuit_builder.input(CurrentExtRow(
-            ClockJumpDifferenceLookupClientLogDerivative.master_ext_table_index(),
-        ));
-
-        let clk_next = circuit_builder.input(NextBaseRow(CLK.master_base_table_index()));
-        let ib1_shrink_stack_next =
-            circuit_builder.input(NextBaseRow(IB1ShrinkStack.master_base_table_index()));
-        let osp_next = circuit_builder.input(NextBaseRow(OSP.master_base_table_index()));
-        let osv_next = circuit_builder.input(NextBaseRow(OSV.master_base_table_index()));
-        let rppa_next =
-            circuit_builder.input(NextExtRow(RunningProductPermArg.master_ext_table_index()));
-        let clock_jump_diff_log_derivative_next = circuit_builder.input(NextExtRow(
-            ClockJumpDifferenceLookupClientLogDerivative.master_ext_table_index(),
-        ));
-
-        // the osp increases by 1 or the osp does not change
-        //
-        // $(osp' - (osp + 1))·(osp' - osp) = 0$
-        let osp_increases_by_1_or_does_not_change =
-            (osp_next.clone() - osp.clone() - one.clone()) * (osp_next.clone() - osp.clone());
-
-        // the osp increases by 1 or the osv does not change OR the ci shrinks the OpStack
-        //
-        // $ (osp' - (osp + 1)) · (osv' - osv) · (1 - ib1) = 0$
-        let osp_increases_by_1_or_osv_does_not_change_or_shrink_stack =
-            (osp_next.clone() - osp.clone() - one.clone())
-                * (osv_next.clone() - osv)
-                * (one.clone() - ib1_shrink_stack);
+        let constant = |c| circuit_builder.b_constant(c);
+        let challenge = |c| circuit_builder.challenge(c);
+        let current_base_row = |column: OpStackBaseTableColumn| {
+            circuit_builder.input(CurrentBaseRow(column.master_base_table_index()))
+        };
+        let current_ext_row = |column: OpStackExtTableColumn| {
+            circuit_builder.input(CurrentExtRow(column.master_ext_table_index()))
+        };
+        let next_base_row = |column: OpStackBaseTableColumn| {
+            circuit_builder.input(NextBaseRow(column.master_base_table_index()))
+        };
+        let next_ext_row = |column: OpStackExtTableColumn| {
+            circuit_builder.input(NextExtRow(column.master_ext_table_index()))
+        };
+
+        let one = constant(1_u32.into());
+        let padding_indicator = constant(PADDING_VALUE);
+
+        let clk = current_base_row(CLK);
+        let ib1_shrink_stack = current_base_row(IB1ShrinkStack);
+        let stack_pointer = current_base_row(StackPointer);
+        let first_underflow_element = current_base_row(FirstUnderflowElement);
+        let rppa = current_ext_row(RunningProductPermArg);
+        let clock_jump_diff_log_derivative =
+            current_ext_row(ClockJumpDifferenceLookupClientLogDerivative);
+
+        let clk_next = next_base_row(CLK);
+        let ib1_shrink_stack_next = next_base_row(IB1ShrinkStack);
+        let stack_pointer_next = next_base_row(StackPointer);
+        let first_underflow_element_next = next_base_row(FirstUnderflowElement);
+        let rppa_next = next_ext_row(RunningProductPermArg);
+        let clock_jump_diff_log_derivative_next =
+            next_ext_row(ClockJumpDifferenceLookupClientLogDerivative);
+
+        let stack_pointer_increases_by_1_or_does_not_change =
+            (stack_pointer_next.clone() - stack_pointer.clone() - one.clone())
+                * (stack_pointer_next.clone() - stack_pointer.clone());
+
+        let stack_pointer_inc_by_1_or_underflow_element_doesnt_change_or_next_ci_grows_stack =
+            (stack_pointer_next.clone() - stack_pointer.clone() - one.clone())
+                * (first_underflow_element_next.clone() - first_underflow_element.clone())
+                * ib1_shrink_stack_next.clone();
+
+        let next_row_is_padding_row = ib1_shrink_stack_next.clone() - padding_indicator.clone();
+        let if_current_row_is_padding_row_then_next_row_is_padding_row = ib1_shrink_stack.clone()
+            * (ib1_shrink_stack - one.clone())
+            * next_row_is_padding_row.clone();
 
         // The running product for the permutation argument `rppa` is updated correctly.
-        let alpha = circuit_builder.challenge(OpStackIndeterminate);
         let compressed_row = circuit_builder.challenge(OpStackClkWeight) * clk_next.clone()
-            + circuit_builder.challenge(OpStackIb1Weight) * ib1_shrink_stack_next
-            + circuit_builder.challenge(OpStackOspWeight) * osp_next.clone()
-            + circuit_builder.challenge(OpStackOsvWeight) * osv_next;
+            + circuit_builder.challenge(OpStackIb1Weight) * ib1_shrink_stack_next.clone()
+            + circuit_builder.challenge(OpStackPointerWeight) * stack_pointer_next.clone()
+            + circuit_builder.challenge(OpStackFirstUnderflowElementWeight)
+                * first_underflow_element_next;
 
-        let rppa_updates_correctly = rppa_next - rppa * (alpha - compressed_row);
+        let rppa_updates =
+            rppa_next.clone() - rppa.clone() * (challenge(OpStackIndeterminate) - compressed_row);
+
+        let next_row_is_not_padding_row =
+            ib1_shrink_stack_next.clone() * (ib1_shrink_stack_next.clone() - one.clone());
+        let rppa_remains = rppa_next - rppa;
+
+        let rppa_updates_correctly = rppa_updates * next_row_is_padding_row.clone()
+            + rppa_remains * next_row_is_not_padding_row.clone();
 
-        // The running sum of the logarithmic derivative for the clock jump difference Lookup
-        // Argument accumulates a summand of `clk_diff` if and only if the `osp` does not change.
-        // Expressed differently:
-        // - the `osp` changes or the log derivative accumulates a summand, and
-        // - the `osp` does not change or the log derivative does not change.
-        let log_derivative_remains =
-            clock_jump_diff_log_derivative_next.clone() - clock_jump_diff_log_derivative.clone();
         let clk_diff = clk_next - clk;
-        let log_derivative_accumulates = (clock_jump_diff_log_derivative_next
-            - clock_jump_diff_log_derivative)
-            * (circuit_builder.challenge(ClockJumpDifferenceLookupIndeterminate) - clk_diff)
+        let log_derivative_accumulates = (clock_jump_diff_log_derivative_next.clone()
+            - clock_jump_diff_log_derivative.clone())
+            * (challenge(ClockJumpDifferenceLookupIndeterminate) - clk_diff)
             - one.clone();
-        let log_derivative_updates_correctly = (osp_next.clone() - osp.clone() - one)
-            * log_derivative_accumulates
-            + (osp_next - osp) * log_derivative_remains;
+        let log_derivative_remains =
+            clock_jump_diff_log_derivative_next.clone() - clock_jump_diff_log_derivative.clone();
+
+        let log_derivative_accumulates_or_stack_pointer_changes_or_next_row_is_padding_row =
+            log_derivative_accumulates
+                * (stack_pointer_next.clone() - stack_pointer.clone() - one.clone())
+                * next_row_is_padding_row;
+        let log_derivative_remains_or_stack_pointer_doesnt_change =
+            log_derivative_remains.clone() * (stack_pointer_next.clone() - stack_pointer.clone());
+        let log_derivatve_remains_or_next_row_is_not_padding_row =
+            log_derivative_remains.clone() * next_row_is_not_padding_row;
+
+        let log_derivative_updates_correctly =
+            log_derivative_accumulates_or_stack_pointer_changes_or_next_row_is_padding_row
+                + log_derivative_remains_or_stack_pointer_doesnt_change
+                + log_derivatve_remains_or_next_row_is_not_padding_row;
 
         vec![
-            osp_increases_by_1_or_does_not_change,
-            osp_increases_by_1_or_osv_does_not_change_or_shrink_stack,
+            stack_pointer_increases_by_1_or_does_not_change,
+            stack_pointer_inc_by_1_or_underflow_element_doesnt_change_or_next_ci_grows_stack,
+            if_current_row_is_padding_row_then_next_row_is_padding_row,
             rppa_updates_correctly,
             log_derivative_updates_correctly,
         ]
@@ -168,111 +278,64 @@ impl OpStackTable {
         op_stack_table: &mut ArrayViewMut2<BFieldElement>,
         aet: &AlgebraicExecutionTrace,
     ) -> Vec<BFieldElement> {
-        // Store the registers relevant for the Op Stack Table, i.e., CLK, IB1, OSP, and OSV,
-        // with OSP as the key. Preserves, thus allows reusing, the order of the processor's
-        // rows, which are sorted by CLK.
-        let mut pre_processed_op_stack_table: Vec<Vec<_>> = vec![];
-        for processor_row in aet.processor_trace.rows() {
-            let clk = processor_row[ProcessorBaseTableColumn::CLK.base_table_index()];
-            let ib1 = processor_row[ProcessorBaseTableColumn::IB1.base_table_index()];
-            let osp = processor_row[ProcessorBaseTableColumn::OSP.base_table_index()];
-            let osv = processor_row[ProcessorBaseTableColumn::OSV.base_table_index()];
-            // The (honest) prover can only grow the Op Stack's size by at most 1 per execution
-            // step. Hence, the following (a) works, and (b) sorts.
-            let osp_minus_16 = osp.value() as usize - OpStackElement::COUNT;
-            let op_stack_row = (clk, ib1, osv);
-            match osp_minus_16.cmp(&pre_processed_op_stack_table.len()) {
-                Ordering::Less => pre_processed_op_stack_table[osp_minus_16].push(op_stack_row),
-                Ordering::Equal => pre_processed_op_stack_table.push(vec![op_stack_row]),
-                Ordering::Greater => panic!("OSP must increase by at most 1 per execution step."),
-            }
-        }
+        let mut op_stack_table = op_stack_table.slice_mut(s![0..aet.op_stack_table_length(), ..]);
+        let trace_iter = aet.op_stack_underflow_trace.rows().into_iter();
 
-        // Move the rows into the Op Stack Table, sorted by OSP first, CLK second.
-        let mut op_stack_table_row = 0;
-        for (osp_minus_16, rows_with_this_osp) in
-            pre_processed_op_stack_table.into_iter().enumerate()
-        {
-            let osp = BFieldElement::new((osp_minus_16 + OpStackElement::COUNT) as u64);
-            for (clk, ib1, osv) in rows_with_this_osp {
-                op_stack_table[[op_stack_table_row, CLK.base_table_index()]] = clk;
-                op_stack_table[[op_stack_table_row, IB1ShrinkStack.base_table_index()]] = ib1;
-                op_stack_table[[op_stack_table_row, OSP.base_table_index()]] = osp;
-                op_stack_table[[op_stack_table_row, OSV.base_table_index()]] = osv;
-                op_stack_table_row += 1;
-            }
+        let sorted_rows =
+            trace_iter.sorted_by(|row_0, row_1| Self::compare_rows(row_0.view(), row_1.view()));
+        for (row_index, row) in sorted_rows.enumerate() {
+            op_stack_table.row_mut(row_index).assign(&row);
         }
-        assert_eq!(aet.processor_trace.nrows(), op_stack_table_row);
 
-        // Collect all clock jump differences.
-        // The Op Stack Table and the Processor Table have the same length.
+        Self::clock_jump_differences(op_stack_table.view())
+    }
+
+    fn compare_rows(
+        row_0: ArrayView1<BFieldElement>,
+        row_1: ArrayView1<BFieldElement>,
+    ) -> Ordering {
+        let stack_pointer_0 = row_0[StackPointer.base_table_index()].value();
+        let stack_pointer_1 = row_1[StackPointer.base_table_index()].value();
+        let compare_stack_pointers = stack_pointer_0.cmp(&stack_pointer_1);
+
+        let clk_0 = row_0[CLK.base_table_index()].value();
+        let clk_1 = row_1[CLK.base_table_index()].value();
+        let compare_clocks = clk_0.cmp(&clk_1);
+
+        compare_stack_pointers.then(compare_clocks)
+    }
+
+    fn clock_jump_differences(op_stack_table: ArrayView2<BFieldElement>) -> Vec<BFieldElement> {
         let mut clock_jump_differences = vec![];
-        for row_idx in 0..aet.processor_trace.nrows() - 1 {
-            let curr_row = op_stack_table.row(row_idx);
-            let next_row = op_stack_table.row(row_idx + 1);
-            let clk_diff = next_row[CLK.base_table_index()] - curr_row[CLK.base_table_index()];
-            if curr_row[OSP.base_table_index()] == next_row[OSP.base_table_index()] {
-                clock_jump_differences.push(clk_diff);
+        for consecutive_rows in op_stack_table.axis_windows(Axis(0), 2).into_iter() {
+            let current_row = consecutive_rows.row(0);
+            let next_row = consecutive_rows.row(1);
+            let current_stack_pointer = current_row[StackPointer.base_table_index()];
+            let next_stack_pointer = next_row[StackPointer.base_table_index()];
+            if current_stack_pointer == next_stack_pointer {
+                let current_clk = current_row[CLK.base_table_index()];
+                let next_clk = next_row[CLK.base_table_index()];
+                let clk_difference = next_clk - current_clk;
+                clock_jump_differences.push(clk_difference);
             }
         }
         clock_jump_differences
     }
 
-    pub fn pad_trace(mut op_stack_table: ArrayViewMut2<BFieldElement>, processor_table_len: usize) {
-        assert!(
-            processor_table_len > 0,
-            "Processor Table must have at least 1 row."
-        );
-
-        // Set up indices for relevant sections of the table.
-        let padded_height = op_stack_table.nrows();
-        let num_padding_rows = padded_height - processor_table_len;
-        let max_clk_before_padding = processor_table_len - 1;
-        let max_clk_before_padding_row_idx = op_stack_table
-            .rows()
-            .into_iter()
-            .enumerate()
-            .find(|(_, row)| row[CLK.base_table_index()].value() as usize == max_clk_before_padding)
-            .map(|(idx, _)| idx)
-            .expect("Op Stack Table must contain row with clock cycle equal to max cycle.");
-        let rows_to_move_source_section_start = max_clk_before_padding_row_idx + 1;
-        let rows_to_move_source_section_end = processor_table_len;
-        let num_rows_to_move = rows_to_move_source_section_end - rows_to_move_source_section_start;
-        let rows_to_move_dest_section_start = rows_to_move_source_section_start + num_padding_rows;
-        let rows_to_move_dest_section_end = rows_to_move_dest_section_start + num_rows_to_move;
-        let padding_section_start = rows_to_move_source_section_start;
-        let padding_section_end = padding_section_start + num_padding_rows;
-        assert_eq!(padded_height, rows_to_move_dest_section_end);
-
-        // Move all rows below the row with highest CLK to the end of the table – if they exist.
-        if num_rows_to_move > 0 {
-            let rows_to_move_source_range =
-                rows_to_move_source_section_start..rows_to_move_source_section_end;
-            let rows_to_move_dest_range =
-                rows_to_move_dest_section_start..rows_to_move_dest_section_end;
-            let rows_to_move = op_stack_table
-                .slice(s![rows_to_move_source_range, ..])
-                .to_owned();
-            rows_to_move.move_into(&mut op_stack_table.slice_mut(s![rows_to_move_dest_range, ..]));
+    pub fn pad_trace(mut op_stack_table: ArrayViewMut2<BFieldElement>, op_stack_table_len: usize) {
+        let last_row_index = op_stack_table_len.saturating_sub(1);
+        let mut last_row = op_stack_table.row(last_row_index).to_owned();
+        last_row[IB1ShrinkStack.base_table_index()] = PADDING_VALUE;
+        if op_stack_table_len == 0 {
+            let first_stack_pointer = u32::try_from(OpStackElement::COUNT).unwrap().into();
+            last_row[StackPointer.base_table_index()] = first_stack_pointer;
         }
 
-        // Fill the created gap with padding rows, i.e., with copies of the last row before the
-        // gap. This is the padding section.
-        let padding_row_template = op_stack_table
-            .row(max_clk_before_padding_row_idx)
-            .to_owned();
-        let mut padding_section =
-            op_stack_table.slice_mut(s![padding_section_start..padding_section_end, ..]);
+        let mut padding_section = op_stack_table.slice_mut(s![op_stack_table_len.., ..]);
         padding_section
             .axis_iter_mut(Axis(0))
             .into_par_iter()
-            .for_each(|padding_row| padding_row_template.clone().move_into(padding_row));
-
-        // CLK keeps increasing by 1 also in the padding section.
-        let new_clk_values = Array1::from_iter(
-            (processor_table_len..padded_height).map(|clk| BFieldElement::new(clk as u64)),
-        );
-        new_clk_values.move_into(padding_section.slice_mut(s![.., CLK.base_table_index()]));
+            .for_each(|mut row| row.assign(&last_row));
     }
 
     pub fn extend(
@@ -286,8 +349,8 @@ impl OpStackTable {
 
         let clk_weight = challenges[OpStackClkWeight];
         let ib1_weight = challenges[OpStackIb1Weight];
-        let osp_weight = challenges[OpStackOspWeight];
-        let osv_weight = challenges[OpStackOsvWeight];
+        let stack_pointer_weight = challenges[OpStackPointerWeight];
+        let first_underflow_element_weight = challenges[OpStackFirstUnderflowElementWeight];
         let perm_arg_indeterminate = challenges[OpStackIndeterminate];
         let clock_jump_difference_lookup_indeterminate =
             challenges[ClockJumpDifferenceLookupIndeterminate];
@@ -300,21 +363,30 @@ impl OpStackTable {
             let current_row = base_table.row(row_idx);
             let clk = current_row[CLK.base_table_index()];
             let ib1 = current_row[IB1ShrinkStack.base_table_index()];
-            let osp = current_row[OSP.base_table_index()];
-            let osv = current_row[OSV.base_table_index()];
-
-            let compressed_row_for_permutation_argument =
-                clk * clk_weight + ib1 * ib1_weight + osp * osp_weight + osv * osv_weight;
-            running_product *= perm_arg_indeterminate - compressed_row_for_permutation_argument;
-
-            // clock jump difference
-            if let Some(prev_row) = previous_row {
-                if prev_row[OSP.base_table_index()] == current_row[OSP.base_table_index()] {
-                    let clock_jump_difference =
-                        current_row[CLK.base_table_index()] - prev_row[CLK.base_table_index()];
-                    clock_jump_diff_lookup_log_derivative +=
-                        (clock_jump_difference_lookup_indeterminate - clock_jump_difference)
-                            .inverse();
+            let stack_pointer = current_row[StackPointer.base_table_index()];
+            let first_underflow_element = current_row[FirstUnderflowElement.base_table_index()];
+
+            let is_no_padding_row = ib1 != PADDING_VALUE;
+
+            if is_no_padding_row {
+                let compressed_row = clk * clk_weight
+                    + ib1 * ib1_weight
+                    + stack_pointer * stack_pointer_weight
+                    + first_underflow_element * first_underflow_element_weight;
+                running_product *= perm_arg_indeterminate - compressed_row;
+
+                // clock jump difference
+                if let Some(prev_row) = previous_row {
+                    let previous_stack_pointer = prev_row[StackPointer.base_table_index()];
+                    let current_stack_pointer = current_row[StackPointer.base_table_index()];
+                    if previous_stack_pointer == current_stack_pointer {
+                        let previous_clock = prev_row[CLK.base_table_index()];
+                        let current_clock = current_row[CLK.base_table_index()];
+                        let clock_jump_difference = current_clock - previous_clock;
+                        let log_derivative_summand =
+                            clock_jump_difference_lookup_indeterminate - clock_jump_difference;
+                        clock_jump_diff_lookup_log_derivative += log_derivative_summand.inverse();
+                    }
                 }
             }
 
@@ -329,7 +401,13 @@ impl OpStackTable {
 
 #[cfg(test)]
 pub(crate) mod tests {
+    use itertools::Itertools;
     use num_traits::Zero;
+    use proptest::collection::vec;
+    use proptest::prelude::*;
+    use proptest_arbitrary_interop::arb;
+
+    use crate::op_stack::OpStackElement;
 
     use super::*;
 
@@ -420,4 +498,123 @@ pub(crate) mod tests {
 
         true
     }
+
+    proptest! {
+        #[test]
+        fn op_stack_table_entry_either_shrinks_stack_or_grows_stack(
+            entry in arb::<OpStackTableEntry>()
+        ) {
+            let shrinks_stack = entry.shrinks_stack();
+            let grows_stack = entry.grows_stack();
+            assert!(shrinks_stack ^ grows_stack);
+        }
+    }
+
+    proptest! {
+        #[test]
+        fn op_stack_pointer_in_sequence_of_op_stack_table_entries(
+                clk: u32,
+                stack_pointer in OpStackElement::COUNT..1024,
+                base_field_elements in vec(arb::<BFieldElement>(), 0..OpStackElement::COUNT),
+                sequence_of_writes: bool,
+        ) {
+            let sequence_length = u64::try_from(base_field_elements.len()).unwrap();
+            let stack_pointer = u64::try_from(stack_pointer).unwrap();
+
+            let underflow_io_operation = match sequence_of_writes {
+                true => UnderflowIO::Write,
+                false => UnderflowIO::Read,
+            };
+            let underflow_io = base_field_elements
+                .into_iter()
+                .map(underflow_io_operation)
+                .collect();
+
+            let op_stack_pointer = stack_pointer.into();
+            let entries =
+                OpStackTableEntry::from_underflow_io_sequence(clk, op_stack_pointer, underflow_io);
+            let op_stack_pointers = entries
+                .iter()
+                .map(|entry| entry.op_stack_pointer.value())
+                .sorted()
+                .collect_vec();
+
+            let expected_stack_pointer_range = match sequence_of_writes {
+                true => stack_pointer - sequence_length..stack_pointer,
+                false => stack_pointer..stack_pointer + sequence_length,
+            };
+            let expected_op_stack_pointers = expected_stack_pointer_range.collect_vec();
+            prop_assert_eq!(expected_op_stack_pointers, op_stack_pointers);
+        }
+    }
+
+    proptest! {
+        #[test]
+        fn clk_stays_same_in_sequence_of_op_stack_table_entries(
+                clk: u32,
+                stack_pointer in OpStackElement::COUNT..1024,
+                base_field_elements in vec(arb::<BFieldElement>(), 0..OpStackElement::COUNT),
+                sequence_of_writes: bool,
+        ) {
+            let underflow_io_operation = match sequence_of_writes {
+                true => UnderflowIO::Write,
+                false => UnderflowIO::Read,
+            };
+            let underflow_io = base_field_elements
+                .into_iter()
+                .map(underflow_io_operation)
+                .collect();
+
+            let op_stack_pointer = u64::try_from(stack_pointer).unwrap().into();
+            let entries =
+                OpStackTableEntry::from_underflow_io_sequence(clk, op_stack_pointer, underflow_io);
+            let clk_values = entries.iter().map(|entry| entry.clk).collect_vec();
+            let all_clk_values_are_clk = clk_values.iter().all(|&c| c == clk);
+            prop_assert!(all_clk_values_are_clk);
+        }
+    }
+
+    proptest! {
+        #[test]
+        fn compare_rows_with_unequal_stack_pointer_and_equal_clk(
+            stack_pointer_0: u64,
+            stack_pointer_1: u64,
+            clk: u64,
+        ) {
+            let mut row_0 = Array1::zeros(BASE_WIDTH);
+            row_0[StackPointer.base_table_index()] = stack_pointer_0.into();
+            row_0[CLK.base_table_index()] = clk.into();
+
+            let mut row_1 = Array1::zeros(BASE_WIDTH);
+            row_1[StackPointer.base_table_index()] = stack_pointer_1.into();
+            row_1[CLK.base_table_index()] = clk.into();
+
+            let stack_pointer_comparison = stack_pointer_0.cmp(&stack_pointer_1);
+            let row_comparison = OpStackTable::compare_rows(row_0.view(), row_1.view());
+
+            assert_eq!(stack_pointer_comparison, row_comparison);
+        }
+    }
+
+    proptest! {
+        #[test]
+        fn compare_rows_with_equal_stack_pointer_and_unequal_clk(
+            stack_pointer: u64,
+            clk_0: u64,
+            clk_1: u64,
+        ) {
+            let mut row_0 = Array1::zeros(BASE_WIDTH);
+            row_0[StackPointer.base_table_index()] = stack_pointer.into();
+            row_0[CLK.base_table_index()] = clk_0.into();
+
+            let mut row_1 = Array1::zeros(BASE_WIDTH);
+            row_1[StackPointer.base_table_index()] = stack_pointer.into();
+            row_1[CLK.base_table_index()] = clk_1.into();
+
+            let clk_comparison = clk_0.cmp(&clk_1);
+            let row_comparison = OpStackTable::compare_rows(row_0.view(), row_1.view());
+
+            assert_eq!(clk_comparison, row_comparison);
+        }
+    }
 }
diff --git a/triton-vm/src/table/processor_table.rs b/triton-vm/src/table/processor_table.rs
index 3b9e13b64..59356bf7d 100644
--- a/triton-vm/src/table/processor_table.rs
+++ b/triton-vm/src/table/processor_table.rs
@@ -10,8 +10,7 @@ use ndarray::*;
 use num_traits::One;
 use num_traits::Zero;
 use strum::EnumCount;
-use twenty_first::shared_math::b_field_element::BFieldElement;
-use twenty_first::shared_math::b_field_element::BFIELD_ONE;
+use twenty_first::shared_math::b_field_element::*;
 use twenty_first::shared_math::digest::DIGEST_LENGTH;
 use twenty_first::shared_math::traits::Inverse;
 use twenty_first::shared_math::x_field_element::XFieldElement;
@@ -21,6 +20,7 @@ use crate::instruction::AnInstruction::*;
 use crate::instruction::Instruction;
 use crate::instruction::InstructionBit;
 use crate::instruction::ALL_INSTRUCTIONS;
+use crate::op_stack::OpStackElement;
 use crate::table::challenges::ChallengeId;
 use crate::table::challenges::ChallengeId::*;
 use crate::table::challenges::Challenges;
@@ -113,11 +113,12 @@ impl ProcessorTable {
             processor_table.slice_mut(s![processor_table_len.., CLK.base_table_index()]),
         );
 
-        // The 3 memory-like tables do not have a padding indicator. Hence, clock jump differences
-        // are being looked up in their padding sections. The clock jump differences in that
-        // section are always 1. The lookup multiplicities of clock value 1 must be increased
-        // accordingly: one per padding row, times the number of memory-like tables, which is 3.
-        let num_padding_rows = 3 * (processor_table.nrows() - processor_table_len);
+        // The memory-like tables “RAM” and “Jump Stack” do not have a padding indicator. Hence,
+        // clock jump differences are being looked up in their padding sections. The clock jump
+        // differences in that section are always 1. The lookup multiplicities of clock value 1 must
+        // be increased accordingly: one per padding row, times the number of memory-like tables
+        // without padding indicator, which is 2.
+        let num_padding_rows = 2 * (processor_table.nrows() - processor_table_len);
         let num_pad_rows = BFieldElement::new(num_padding_rows as u64);
         let mut row_1 = processor_table.row_mut(1);
         row_1[ClockJumpDifferenceLookupMultiplicity.base_table_index()] += num_pad_rows;
@@ -178,20 +179,14 @@ impl ProcessorTable {
                     .inverse();
             }
 
-            // OpStack table
-            let clk = current_row[CLK.base_table_index()];
-            let ib1 = current_row[IB1.base_table_index()];
-            let osp = current_row[OSP.base_table_index()];
-            let osv = current_row[OSV.base_table_index()];
-            let compressed_row_for_op_stack_table_permutation_argument = clk
-                * challenges[OpStackClkWeight]
-                + ib1 * challenges[OpStackIb1Weight]
-                + osp * challenges[OpStackOspWeight]
-                + osv * challenges[OpStackOsvWeight];
-            op_stack_table_running_product *= challenges[OpStackIndeterminate]
-                - compressed_row_for_op_stack_table_permutation_argument;
+            op_stack_table_running_product *= Self::factor_for_op_stack_table_running_product(
+                previous_row,
+                current_row,
+                challenges,
+            );
 
             // RAM Table
+            let clk = current_row[CLK.base_table_index()];
             let ramv = current_row[RAMV.base_table_index()];
             let ramp = current_row[RAMP.base_table_index()];
             let previous_instruction = current_row[PreviousInstruction.base_table_index()];
@@ -357,6 +352,80 @@ impl ProcessorTable {
             previous_row = Some(current_row);
         }
     }
+
+    fn factor_for_op_stack_table_running_product(
+        maybe_previous_row: Option<ArrayView1<BFieldElement>>,
+        current_row: ArrayView1<BFieldElement>,
+        challenges: &Challenges,
+    ) -> XFieldElement {
+        let default_factor = XFieldElement::one();
+
+        let is_padding_row = current_row[IsPadding.base_table_index()].is_one();
+        if is_padding_row {
+            return default_factor;
+        }
+
+        let Some(previous_row) = maybe_previous_row else {
+            return default_factor;
+        };
+
+        let previous_opcode = previous_row[CI.base_table_index()];
+        let Ok(previous_instruction): Result<Instruction, _> = previous_opcode.try_into() else {
+            return default_factor;
+        };
+
+        // shorter stack means relevant information is on top of stack, i.e., in stack registers
+        let row_with_shorter_stack = match previous_instruction.grows_op_stack() {
+            true => previous_row.view(),
+            false => current_row.view(),
+        };
+        let op_stack_delta = previous_instruction
+            .op_stack_size_influence()
+            .unsigned_abs() as usize;
+
+        let mut factor = default_factor;
+        for op_stack_pointer_offset in 0..op_stack_delta {
+            let max_stack_element_index = OpStackElement::COUNT - 1;
+            let stack_element_index = max_stack_element_index - op_stack_pointer_offset;
+            let stack_element_column = Self::op_stack_column_by_index(stack_element_index);
+            let underflow_element = row_with_shorter_stack[stack_element_column.base_table_index()];
+
+            let op_stack_pointer = row_with_shorter_stack[OpStackPointer.base_table_index()];
+            let offset = BFieldElement::new(op_stack_pointer_offset as u64);
+            let offset_op_stack_pointer = op_stack_pointer + offset;
+
+            let clk = previous_row[CLK.base_table_index()];
+            let ib1_shrink_stack = previous_row[IB1.base_table_index()];
+            let compressed_row = clk * challenges[OpStackClkWeight]
+                + ib1_shrink_stack * challenges[OpStackIb1Weight]
+                + offset_op_stack_pointer * challenges[OpStackPointerWeight]
+                + underflow_element * challenges[OpStackFirstUnderflowElementWeight];
+            factor *= challenges[OpStackIndeterminate] - compressed_row;
+        }
+        factor
+    }
+
+    fn op_stack_column_by_index(index: usize) -> ProcessorBaseTableColumn {
+        match index {
+            0 => ST0,
+            1 => ST1,
+            2 => ST2,
+            3 => ST3,
+            4 => ST4,
+            5 => ST5,
+            6 => ST6,
+            7 => ST7,
+            8 => ST8,
+            9 => ST9,
+            10 => ST10,
+            11 => ST11,
+            12 => ST12,
+            13 => ST13,
+            14 => ST14,
+            15 => ST15,
+            i => panic!("Op Stack column index must be in [0, 15], not {i}."),
+        }
+    }
 }
 
 #[derive(Debug, Clone)]
@@ -392,8 +461,7 @@ impl ExtProcessorTable {
         let st8_is_0 = base_row(ST8);
         let st9_is_0 = base_row(ST9);
         let st10_is_0 = base_row(ST10);
-        let osp_is_16 = base_row(OSP) - constant(16);
-        let osv_is_0 = base_row(OSV);
+        let op_stack_pointer_is_16 = base_row(OpStackPointer) - constant(16);
         let ramp_is_0 = base_row(RAMP);
         let previous_instruction_is_0 = base_row(PreviousInstruction);
 
@@ -437,17 +505,8 @@ impl ExtProcessorTable {
         let running_evaluation_for_standard_output_is_initialized_correctly =
             ext_row(OutputTableEvalArg) - x_constant(EvalArg::default_initial());
 
-        // op-stack table
-        let op_stack_indeterminate = challenge(OpStackIndeterminate);
-        let op_stack_ib1_weight = challenge(OpStackIb1Weight);
-        let op_stack_osp_weight = challenge(OpStackOspWeight);
-        // note: `clk` and `osv` are already constrained to be 0, `osp` to be 16
-        let compressed_row_for_op_stack_table =
-            op_stack_ib1_weight * base_row(IB1) + op_stack_osp_weight * constant(16);
         let running_product_for_op_stack_table_is_initialized_correctly =
-            ext_row(OpStackTablePermArg)
-                - x_constant(PermArg::default_initial())
-                    * (op_stack_indeterminate - compressed_row_for_op_stack_table);
+            ext_row(OpStackTablePermArg) - x_constant(PermArg::default_initial());
 
         // ram table
         let ram_indeterminate = challenge(RamIndeterminate);
@@ -520,8 +579,7 @@ impl ExtProcessorTable {
             st9_is_0,
             st10_is_0,
             compressed_program_digest_is_expected_program_digest,
-            osp_is_16,
-            osv_is_0,
+            op_stack_pointer_is_16,
             ramp_is_0,
             previous_instruction_is_0,
             running_evaluation_for_standard_input_is_initialized_correctly,
@@ -793,6 +851,12 @@ impl ExtProcessorTable {
         let next_base_row = |col: ProcessorBaseTableColumn| {
             circuit_builder.input(NextBaseRow(col.master_base_table_index()))
         };
+        let curr_ext_row = |col: ProcessorExtTableColumn| {
+            circuit_builder.input(CurrentExtRow(col.master_ext_table_index()))
+        };
+        let next_ext_row = |col: ProcessorExtTableColumn| {
+            circuit_builder.input(NextExtRow(col.master_ext_table_index()))
+        };
 
         vec![
             next_base_row(ST11) - curr_base_row(ST11),
@@ -800,10 +864,8 @@ impl ExtProcessorTable {
             next_base_row(ST13) - curr_base_row(ST13),
             next_base_row(ST14) - curr_base_row(ST14),
             next_base_row(ST15) - curr_base_row(ST15),
-            // The top of the OpStack underflow, i.e., osv, does not change.
-            next_base_row(OSV) - curr_base_row(OSV),
-            // The OpStack pointer, osp, does not change.
-            next_base_row(OSP) - curr_base_row(OSP),
+            next_base_row(OpStackPointer) - curr_base_row(OpStackPointer),
+            next_ext_row(OpStackTablePermArg) - curr_ext_row(OpStackTablePermArg),
         ]
     }
 
@@ -895,17 +957,33 @@ impl ExtProcessorTable {
         circuit_builder: &ConstraintCircuitBuilder<DualRowIndicator>,
     ) -> Vec<ConstraintCircuitMonad<DualRowIndicator>> {
         let constant = |c: u32| circuit_builder.b_constant(c.into());
+        let challenge = |c: ChallengeId| circuit_builder.challenge(c);
         let curr_base_row = |col: ProcessorBaseTableColumn| {
             circuit_builder.input(CurrentBaseRow(col.master_base_table_index()))
         };
         let next_base_row = |col: ProcessorBaseTableColumn| {
             circuit_builder.input(NextBaseRow(col.master_base_table_index()))
         };
+        let curr_ext_row = |col: ProcessorExtTableColumn| {
+            circuit_builder.input(CurrentExtRow(col.master_ext_table_index()))
+        };
+        let next_ext_row = |col: ProcessorExtTableColumn| {
+            circuit_builder.input(NextExtRow(col.master_ext_table_index()))
+        };
+
+        let compressed_row_for_op_stack_permutation_argument = challenge(OpStackClkWeight)
+            * curr_base_row(CLK)
+            + challenge(OpStackIb1Weight) * curr_base_row(IB1)
+            + challenge(OpStackPointerWeight) * curr_base_row(OpStackPointer)
+            + challenge(OpStackFirstUnderflowElementWeight) * curr_base_row(ST15);
+        let factor_for_op_stack_permutation_argument =
+            challenge(OpStackIndeterminate) - compressed_row_for_op_stack_permutation_argument;
+        let running_product_op_stack_table_has_accumulated_last_element_of_current_row =
+            next_ext_row(OpStackTablePermArg)
+                - curr_ext_row(OpStackTablePermArg) * factor_for_op_stack_permutation_argument;
 
         vec![
-            // The stack element in st1 is moved into st2.
             next_base_row(ST2) - curr_base_row(ST1),
-            // And so on...
             next_base_row(ST3) - curr_base_row(ST2),
             next_base_row(ST4) - curr_base_row(ST3),
             next_base_row(ST5) - curr_base_row(ST4),
@@ -919,10 +997,8 @@ impl ExtProcessorTable {
             next_base_row(ST13) - curr_base_row(ST12),
             next_base_row(ST14) - curr_base_row(ST13),
             next_base_row(ST15) - curr_base_row(ST14),
-            // The stack element in st15 is moved to the top of OpStack underflow, i.e., osv.
-            next_base_row(OSV) - curr_base_row(ST15),
-            // The OpStack pointer is incremented by 1.
-            next_base_row(OSP) - (curr_base_row(OSP) + constant(1)),
+            next_base_row(OpStackPointer) - (curr_base_row(OpStackPointer) + constant(1)),
+            running_product_op_stack_table_has_accumulated_last_element_of_current_row,
         ]
     }
 
@@ -949,19 +1025,33 @@ impl ExtProcessorTable {
         circuit_builder: &ConstraintCircuitBuilder<DualRowIndicator>,
     ) -> Vec<ConstraintCircuitMonad<DualRowIndicator>> {
         let constant = |c: u32| circuit_builder.b_constant(c.into());
+        let challenge = |c: ChallengeId| circuit_builder.challenge(c);
         let curr_base_row = |col: ProcessorBaseTableColumn| {
             circuit_builder.input(CurrentBaseRow(col.master_base_table_index()))
         };
         let next_base_row = |col: ProcessorBaseTableColumn| {
             circuit_builder.input(NextBaseRow(col.master_base_table_index()))
         };
+        let curr_ext_row = |col: ProcessorExtTableColumn| {
+            circuit_builder.input(CurrentExtRow(col.master_ext_table_index()))
+        };
+        let next_ext_row = |col: ProcessorExtTableColumn| {
+            circuit_builder.input(NextExtRow(col.master_ext_table_index()))
+        };
 
+        let compressed_row_for_op_stack_permutation_argument = challenge(OpStackClkWeight)
+            * curr_base_row(CLK)
+            + challenge(OpStackIb1Weight) * curr_base_row(IB1)
+            + challenge(OpStackPointerWeight) * next_base_row(OpStackPointer)
+            + challenge(OpStackFirstUnderflowElementWeight) * next_base_row(ST15);
+        let factor_for_op_stack_permutation_argument =
+            challenge(OpStackIndeterminate) - compressed_row_for_op_stack_permutation_argument;
+        let running_product_op_stack_table_has_accumulated_last_element_of_next_row =
+            next_ext_row(OpStackTablePermArg)
+                - curr_ext_row(OpStackTablePermArg) * factor_for_op_stack_permutation_argument;
         vec![
-            // The stack element in st4 is moved into st3.
             next_base_row(ST3) - curr_base_row(ST4),
-            // The stack element in st5 is moved into st4.
             next_base_row(ST4) - curr_base_row(ST5),
-            // And so on...
             next_base_row(ST5) - curr_base_row(ST6),
             next_base_row(ST6) - curr_base_row(ST7),
             next_base_row(ST7) - curr_base_row(ST8),
@@ -972,12 +1062,10 @@ impl ExtProcessorTable {
             next_base_row(ST12) - curr_base_row(ST13),
             next_base_row(ST13) - curr_base_row(ST14),
             next_base_row(ST14) - curr_base_row(ST15),
-            // The stack element at the top of OpStack underflow, i.e., osv, is moved into st15.
-            next_base_row(ST15) - curr_base_row(OSV),
-            // The OpStack pointer, osp, is decremented by 1.
-            next_base_row(OSP) - (curr_base_row(OSP) - constant(1)),
-            // The helper variable register hv0 holds the inverse of (osp - 16).
-            (curr_base_row(OSP) - constant(16)) * curr_base_row(HV0) - constant(1),
+            next_base_row(OpStackPointer) - (curr_base_row(OpStackPointer) - constant(1)),
+            running_product_op_stack_table_has_accumulated_last_element_of_next_row,
+            // The helper variable register hv0 holds the inverse of (`op_stack_pointer` - 16).
+            (curr_base_row(OpStackPointer) - constant(16)) * curr_base_row(HV0) - constant(1),
         ]
     }
 
@@ -1282,9 +1370,15 @@ impl ExtProcessorTable {
         let curr_base_row = |col: ProcessorBaseTableColumn| {
             circuit_builder.input(CurrentBaseRow(col.master_base_table_index()))
         };
+        let curr_ext_row = |col: ProcessorExtTableColumn| {
+            circuit_builder.input(CurrentExtRow(col.master_ext_table_index()))
+        };
         let next_base_row = |col: ProcessorBaseTableColumn| {
             circuit_builder.input(NextBaseRow(col.master_base_table_index()))
         };
+        let next_ext_row = |col: ProcessorExtTableColumn| {
+            circuit_builder.input(NextExtRow(col.master_ext_table_index()))
+        };
 
         let specific_constraints = vec![
             indicator_poly(0),
@@ -1333,8 +1427,8 @@ impl ExtProcessorTable {
             (one() - indicator_poly(13)) * (next_base_row(ST13) - curr_base_row(ST13)),
             (one() - indicator_poly(14)) * (next_base_row(ST14) - curr_base_row(ST14)),
             (one() - indicator_poly(15)) * (next_base_row(ST15) - curr_base_row(ST15)),
-            next_base_row(OSV) - curr_base_row(OSV),
-            next_base_row(OSP) - curr_base_row(OSP),
+            next_base_row(OpStackPointer) - curr_base_row(OpStackPointer),
+            next_ext_row(OpStackTablePermArg) - curr_ext_row(OpStackTablePermArg),
         ];
         [
             specific_constraints,
@@ -2331,29 +2425,6 @@ impl ExtProcessorTable {
             + write_io_deselector * running_evaluation_updates
     }
 
-    fn running_product_for_op_stack_table_updates_correctly(
-        circuit_builder: &ConstraintCircuitBuilder<DualRowIndicator>,
-    ) -> ConstraintCircuitMonad<DualRowIndicator> {
-        let challenge = |c: ChallengeId| circuit_builder.challenge(c);
-        let next_base_row = |col: ProcessorBaseTableColumn| {
-            circuit_builder.input(NextBaseRow(col.master_base_table_index()))
-        };
-        let curr_ext_row = |col: ProcessorExtTableColumn| {
-            circuit_builder.input(CurrentExtRow(col.master_ext_table_index()))
-        };
-        let next_ext_row = |col: ProcessorExtTableColumn| {
-            circuit_builder.input(NextExtRow(col.master_ext_table_index()))
-        };
-
-        let compressed_row = challenge(OpStackClkWeight) * next_base_row(CLK)
-            + challenge(OpStackIb1Weight) * next_base_row(IB1)
-            + challenge(OpStackOspWeight) * next_base_row(OSP)
-            + challenge(OpStackOsvWeight) * next_base_row(OSV);
-
-        next_ext_row(OpStackTablePermArg)
-            - curr_ext_row(OpStackTablePermArg) * (challenge(OpStackIndeterminate) - compressed_row)
-    }
-
     fn running_product_for_ram_table_updates_correctly(
         circuit_builder: &ConstraintCircuitBuilder<DualRowIndicator>,
     ) -> ConstraintCircuitMonad<DualRowIndicator> {
@@ -2748,7 +2819,6 @@ impl ExtProcessorTable {
             Self::running_evaluation_for_standard_input_updates_correctly(circuit_builder),
             Self::log_derivative_for_instruction_lookup_updates_correctly(circuit_builder),
             Self::running_evaluation_for_standard_output_updates_correctly(circuit_builder),
-            Self::running_product_for_op_stack_table_updates_correctly(circuit_builder),
             Self::running_product_for_ram_table_updates_correctly(circuit_builder),
             Self::running_product_for_jump_stack_table_updates_correctly(circuit_builder),
             Self::running_evaluation_hash_input_updates_correctly(circuit_builder),
@@ -2842,7 +2912,7 @@ impl<'a> Display for ProcessorTraceRow<'a> {
         row(
             f,
             format!(
-                "ramp: {:>width$} │ ramv: {:>width$} │",
+                "ramp: {:>width$} │ ramv: {:>width$} ╵",
                 self.row[RAMP.base_table_index()].value(),
                 self.row[RAMV.base_table_index()].value(),
             ),
@@ -2850,9 +2920,8 @@ impl<'a> Display for ProcessorTraceRow<'a> {
         row(
             f,
             format!(
-                "osp:  {:>width$} │ osv:  {:>width$} ╵",
-                self.row[OSP.base_table_index()].value(),
-                self.row[OSV.base_table_index()].value(),
+                "osp:  {:>width$} ╵",
+                self.row[OpStackPointer.base_table_index()].value(),
             ),
         )?;
 
@@ -2983,6 +3052,9 @@ impl<'a> Display for ExtProcessorTraceRow<'a> {
 #[cfg(test)]
 pub(crate) mod tests {
     use ndarray::Array2;
+    use proptest::collection::vec;
+    use proptest::prelude::*;
+    use proptest_arbitrary_interop::arb;
     use rand::thread_rng;
     use rand::Rng;
     use strum::IntoEnumIterator;
@@ -3533,8 +3605,8 @@ pub(crate) mod tests {
         test_constraints_for_rows_with_debug_info(
             XbMul,
             &test_rows,
-            &[ST0, ST1, ST2, ST3, OSP, HV0],
-            &[ST0, ST1, ST2, ST3, OSP, HV0],
+            &[ST0, ST1, ST2, ST3, OpStackPointer, HV0],
+            &[ST0, ST1, ST2, ST3, OpStackPointer, HV0],
         );
     }
 
@@ -3781,4 +3853,43 @@ pub(crate) mod tests {
         let circuit_builder = ConstraintCircuitBuilder::new();
         ExtProcessorTable::indicator_polynomial(&circuit_builder, index);
     }
+
+    #[test]
+    fn can_get_op_stack_column_for_in_range_index() {
+        for index in 0..OpStackElement::COUNT {
+            let _ = ProcessorTable::op_stack_column_by_index(index);
+        }
+    }
+
+    proptest! {
+        #[test]
+        #[should_panic(expected="[0, 15]")]
+        fn cannot_get_op_stack_column_for_out_of_range_index(
+            index in OpStackElement::COUNT..,
+        ) {
+            let _ = ProcessorTable::op_stack_column_by_index(index);
+        }
+    }
+
+    proptest! {
+        #[test]
+        fn constructing_factor_for_op_stack_table_running_product_never_panics(
+            has_previous_row: bool,
+            previous_row in vec(arb::<BFieldElement>(), BASE_WIDTH),
+            current_row in vec(arb::<BFieldElement>(), BASE_WIDTH),
+            challenges in arb::<Challenges>(),
+        ) {
+            let previous_row = Array1::from(previous_row);
+            let current_row = Array1::from(current_row);
+            let maybe_previous_row = match has_previous_row {
+                true => Some(previous_row.view()),
+                false => None,
+            };
+            let _ = ProcessorTable::factor_for_op_stack_table_running_product(
+                maybe_previous_row,
+                current_row.view(),
+                &challenges
+            );
+        }
+    }
 }
diff --git a/triton-vm/src/table/table_column.rs b/triton-vm/src/table/table_column.rs
index f2dcddff3..884fb98d2 100644
--- a/triton-vm/src/table/table_column.rs
+++ b/triton-vm/src/table/table_column.rs
@@ -135,8 +135,7 @@ pub enum ProcessorBaseTableColumn {
     ST13,
     ST14,
     ST15,
-    OSP,
-    OSV,
+    OpStackPointer,
     HV0,
     HV1,
     HV2,
@@ -183,8 +182,8 @@ pub enum ProcessorExtTableColumn {
 pub enum OpStackBaseTableColumn {
     CLK,
     IB1ShrinkStack,
-    OSP,
-    OSV,
+    StackPointer,
+    FirstUnderflowElement,
 }
 
 #[repr(usize)]
diff --git a/triton-vm/src/vm.rs b/triton-vm/src/vm.rs
index 3acebb257..6e0fc19c7 100644
--- a/triton-vm/src/vm.rs
+++ b/triton-vm/src/vm.rs
@@ -29,6 +29,7 @@ use crate::op_stack::*;
 use crate::program::*;
 use crate::stark::StarkHasher;
 use crate::table::hash_table::PermutationTrace;
+use crate::table::op_stack_table::OpStackTableEntry;
 use crate::table::processor_table;
 use crate::table::processor_table::ProcessorTraceRow;
 use crate::table::table_column::*;
@@ -40,7 +41,6 @@ pub const NUM_HELPER_VARIABLE_REGISTERS: usize = 7;
 
 #[derive(Debug, Clone)]
 pub struct VMState<'pgm> {
-    // Memory
     /// The **program memory** stores the instructions (and their arguments) of the program
     /// currently being executed by Triton VM. It is read-only.
     pub program: &'pgm [Instruction],
@@ -66,7 +66,6 @@ pub struct VMState<'pgm> {
     /// The **Jump-stack memory** stores the entire jump stack.
     pub jump_stack: Vec<(BFieldElement, BFieldElement)>,
 
-    // Registers
     /// Number of cycles the program has been running for
     pub cycle_count: u32,
 
@@ -77,7 +76,7 @@ pub struct VMState<'pgm> {
     pub previous_instruction: BFieldElement,
 
     /// RAM pointer
-    pub ramp: u64,
+    pub ram_pointer: u64,
 
     /// The current state of the one, global Sponge that can be manipulated using instructions
     /// `SpongeInit`, `SpongeAbsorb`, and `SpongeSqueeze`. Instruction `SpongeInit` resets the
@@ -86,7 +85,6 @@ pub struct VMState<'pgm> {
     /// exposed outside of the VM.
     pub sponge_state: [BFieldElement; tip5::STATE_SIZE],
 
-    // Bookkeeping
     /// Indicates whether the terminating instruction `halt` has been executed.
     pub halting: bool,
 }
@@ -102,8 +100,9 @@ pub enum CoProcessorCall {
     /// One row per round in the Tip5 permutation.
     Tip5Trace(Instruction, Box<PermutationTrace>),
 
-    SingleU32TableEntry(U32TableEntry),
-    DoubleU32TableEntry([U32TableEntry; 2]),
+    U32Call(U32TableEntry),
+
+    OpStackCall(OpStackTableEntry),
 }
 
 impl<'pgm> VMState<'pgm> {
@@ -131,7 +130,7 @@ impl<'pgm> VMState<'pgm> {
             cycle_count: 0,
             instruction_pointer: 0,
             previous_instruction: Default::default(),
-            ramp: 0,
+            ram_pointer: 0,
             sponge_state: Default::default(),
             halting: false,
         }
@@ -144,7 +143,7 @@ impl<'pgm> VMState<'pgm> {
         };
 
         if current_instruction.shrinks_op_stack() {
-            let op_stack_pointer = self.op_stack.op_stack_pointer();
+            let op_stack_pointer = self.op_stack.pointer();
             let maximum_op_stack_pointer = BFieldElement::new(OpStackElement::COUNT as u64);
             let op_stack_pointer_minus_maximum = op_stack_pointer - maximum_op_stack_pointer;
             hvs[0] = op_stack_pointer_minus_maximum.inverse_or_zero();
@@ -206,13 +205,13 @@ impl<'pgm> VMState<'pgm> {
     }
 
     /// Perform the state transition as a mutable operation on `self`.
-    pub fn step(&mut self) -> Result<Option<CoProcessorCall>> {
+    pub fn step(&mut self) -> Result<Vec<CoProcessorCall>> {
         // trying to read past the end of the program doesn't change the previous instruction
         if let Ok(instruction) = self.current_instruction() {
             self.previous_instruction = instruction.opcode_b();
         }
 
-        let maybe_co_processor_trace = match self.current_instruction()? {
+        let co_processor_calls = match self.current_instruction()? {
             Pop => self.pop()?,
             Push(field_element) => self.push(field_element),
             Divine => self.divine()?,
@@ -258,147 +257,197 @@ impl<'pgm> VMState<'pgm> {
         }
 
         self.cycle_count += 1;
-        Ok(maybe_co_processor_trace)
+        Ok(co_processor_calls)
+    }
+
+    fn underflow_io_sequence_to_co_processor_calls(
+        &self,
+        underflow_io_sequence: Vec<UnderflowIO>,
+    ) -> Vec<CoProcessorCall> {
+        let op_stack_table_entries = OpStackTableEntry::from_underflow_io_sequence(
+            self.cycle_count,
+            self.op_stack.pointer(),
+            underflow_io_sequence,
+        );
+        op_stack_table_entries
+            .into_iter()
+            .map(OpStackCall)
+            .collect()
     }
 
-    fn pop(&mut self) -> Result<Option<CoProcessorCall>> {
-        self.op_stack.pop()?;
+    fn pop(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (_, underflow_io) = self.op_stack.pop()?;
+
+        let op_stack_calls = self.underflow_io_sequence_to_co_processor_calls(vec![underflow_io]);
+
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(op_stack_calls)
     }
 
-    fn push(&mut self, element: BFieldElement) -> Option<CoProcessorCall> {
-        self.op_stack.push(element);
+    fn push(&mut self, element: BFieldElement) -> Vec<CoProcessorCall> {
+        let underflow_io = self.op_stack.push(element);
+
+        let op_stack_calls = self.underflow_io_sequence_to_co_processor_calls(vec![underflow_io]);
+
         self.instruction_pointer += 2;
-        None
+        op_stack_calls
     }
 
-    fn divine(&mut self) -> Result<Option<CoProcessorCall>> {
+    fn divine(&mut self) -> Result<Vec<CoProcessorCall>> {
         let element = self.secret_individual_tokens.pop_front().ok_or(anyhow!(
             "Instruction `divine`: secret input buffer is empty."
         ))?;
-        self.op_stack.push(element);
+        let underflow_io = self.op_stack.push(element);
+
+        let op_stack_calls = self.underflow_io_sequence_to_co_processor_calls(vec![underflow_io]);
+
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(op_stack_calls)
     }
 
-    fn dup(&mut self, stack_register: OpStackElement) -> Option<CoProcessorCall> {
+    fn dup(&mut self, stack_register: OpStackElement) -> Vec<CoProcessorCall> {
         let element = self.op_stack.peek_at(stack_register);
-        self.op_stack.push(element);
+        let underflow_io = self.op_stack.push(element);
+
+        let op_stack_calls = self.underflow_io_sequence_to_co_processor_calls(vec![underflow_io]);
+
         self.instruction_pointer += 2;
-        None
+        op_stack_calls
     }
 
-    fn swap(&mut self, stack_register: OpStackElement) -> Result<Option<CoProcessorCall>> {
+    fn swap(&mut self, stack_register: OpStackElement) -> Result<Vec<CoProcessorCall>> {
         if stack_register == ST0 {
             bail!(SwapST0);
         }
         self.op_stack.swap_top_with(stack_register);
         self.instruction_pointer += 2;
-        Ok(None)
+        Ok(vec![])
     }
 
-    fn nop(&mut self) -> Option<CoProcessorCall> {
+    fn nop(&mut self) -> Vec<CoProcessorCall> {
         self.instruction_pointer += 1;
-        None
+        vec![]
     }
 
-    fn skiz(&mut self) -> Result<Option<CoProcessorCall>> {
-        let top_of_stack = self.op_stack.pop()?;
+    fn skiz(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (top_of_stack, underflow_io) = self.op_stack.pop()?;
+
+        let op_stack_calls = self.underflow_io_sequence_to_co_processor_calls(vec![underflow_io]);
+
         self.instruction_pointer += match top_of_stack.is_zero() {
             true => 1 + self.next_instruction()?.size(),
             false => 1,
         };
-        Ok(None)
+        Ok(op_stack_calls)
     }
 
-    fn call(&mut self, call_destination: BFieldElement) -> Option<CoProcessorCall> {
+    fn call(&mut self, call_destination: BFieldElement) -> Vec<CoProcessorCall> {
         let size_of_instruction_call = 2;
         let call_origin = (self.instruction_pointer as u32 + size_of_instruction_call).into();
         let jump_stack_entry = (call_origin, call_destination);
         self.jump_stack.push(jump_stack_entry);
 
         self.instruction_pointer = call_destination.value() as usize;
-        None
+        vec![]
     }
 
-    fn return_from_call(&mut self) -> Result<Option<CoProcessorCall>> {
+    fn return_from_call(&mut self) -> Result<Vec<CoProcessorCall>> {
         let (call_origin, _) = self.jump_stack_pop()?;
         self.instruction_pointer = call_origin.value() as usize;
-        Ok(None)
+        Ok(vec![])
     }
 
-    fn recurse(&mut self) -> Result<Option<CoProcessorCall>> {
+    fn recurse(&mut self) -> Result<Vec<CoProcessorCall>> {
         let (_, call_destination) = self.jump_stack_peek()?;
         self.instruction_pointer = call_destination.value() as usize;
-        Ok(None)
+        Ok(vec![])
     }
 
-    fn assert(&mut self) -> Result<Option<CoProcessorCall>> {
-        let top_of_stack = self.op_stack.pop()?;
+    fn assert(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (top_of_stack, underflow_io) = self.op_stack.pop()?;
+
+        let op_stack_calls = self.underflow_io_sequence_to_co_processor_calls(vec![underflow_io]);
+
         if !top_of_stack.is_one() {
             let assertion_failed =
                 AssertionFailed(self.instruction_pointer, self.cycle_count, top_of_stack);
             bail!(assertion_failed);
         }
+
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(op_stack_calls)
     }
 
-    fn halt(&mut self) -> Option<CoProcessorCall> {
+    fn halt(&mut self) -> Vec<CoProcessorCall> {
         self.halting = true;
         self.instruction_pointer += 1;
-        None
+        vec![]
     }
 
-    fn read_mem(&mut self) -> Option<CoProcessorCall> {
+    fn read_mem(&mut self) -> Vec<CoProcessorCall> {
         let ram_pointer = self.op_stack.peek_at(ST0);
+        self.ram_pointer = ram_pointer.value();
+
         let ram_value = self.memory_get(&ram_pointer);
-        self.op_stack.push(ram_value);
-        self.ramp = ram_pointer.value();
+        let underflow_io = self.op_stack.push(ram_value);
+
+        let op_stack_calls = self.underflow_io_sequence_to_co_processor_calls(vec![underflow_io]);
+
         self.instruction_pointer += 1;
-        None
+        op_stack_calls
     }
 
-    fn write_mem(&mut self) -> Result<Option<CoProcessorCall>> {
+    fn write_mem(&mut self) -> Result<Vec<CoProcessorCall>> {
         let ram_pointer = self.op_stack.peek_at(ST1);
-        let ram_value = self.op_stack.pop()?;
-        self.ramp = ram_pointer.value();
+        let (ram_value, underflow_io) = self.op_stack.pop()?;
+        self.ram_pointer = ram_pointer.value();
         self.ram.insert(ram_pointer, ram_value);
+
+        let op_stack_calls = self.underflow_io_sequence_to_co_processor_calls(vec![underflow_io]);
+
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(op_stack_calls)
     }
 
-    fn hash(&mut self) -> Result<Option<CoProcessorCall>> {
-        let to_hash = self.op_stack.pop_multiple::<{ tip5::RATE }>()?;
+    fn hash(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (to_hash, underflow_io) = self.op_stack.pop_multiple::<{ tip5::RATE }>()?;
+        let mut all_underflow_io = underflow_io.to_vec();
+
         let mut hash_input = Tip5State::new(Domain::FixedLength);
         hash_input.state[..tip5::RATE].copy_from_slice(&to_hash);
         let tip5_trace = Tip5::trace(&mut hash_input);
         let hash_output = &tip5_trace[tip5_trace.len() - 1][0..DIGEST_LENGTH];
 
         for i in (0..DIGEST_LENGTH).rev() {
-            self.op_stack.push(hash_output[i]);
+            let underflow_io = self.op_stack.push(hash_output[i]);
+            all_underflow_io.push(underflow_io);
         }
         for _ in 0..DIGEST_LENGTH {
-            self.op_stack.push(BFieldElement::zero());
+            let underflow_io = self.op_stack.push(BFieldElement::zero());
+            all_underflow_io.push(underflow_io);
         }
-        let co_processor_trace = Some(Tip5Trace(Hash, Box::new(tip5_trace)));
+
+        let mut co_processor_calls =
+            self.underflow_io_sequence_to_co_processor_calls(all_underflow_io);
+        co_processor_calls.push(Tip5Trace(Hash, Box::new(tip5_trace)));
 
         self.instruction_pointer += 1;
-        Ok(co_processor_trace)
+        Ok(co_processor_calls)
     }
 
-    fn sponge_init(&mut self) -> Option<CoProcessorCall> {
+    fn sponge_init(&mut self) -> Vec<CoProcessorCall> {
         self.sponge_state = Tip5::init().state;
         self.instruction_pointer += 1;
-        Some(SpongeStateReset)
+        vec![SpongeStateReset]
     }
 
-    fn sponge_absorb(&mut self) -> Result<Option<CoProcessorCall>> {
+    fn sponge_absorb(&mut self) -> Result<Vec<CoProcessorCall>> {
         // fetch top elements but don't alter the stack
-        let to_absorb = self.op_stack.pop_multiple::<{ tip5::RATE }>()?;
+        let (to_absorb, underflow_io) = self.op_stack.pop_multiple::<{ tip5::RATE }>()?;
+        let mut all_underflow_io = underflow_io.to_vec();
         for i in (0..tip5::RATE).rev() {
-            self.op_stack.push(to_absorb[i]);
+            let underflow_io = self.op_stack.push(to_absorb[i]);
+            all_underflow_io.push(underflow_io);
         }
 
         self.sponge_state[..tip5::RATE].copy_from_slice(&to_absorb);
@@ -406,50 +455,59 @@ impl<'pgm> VMState<'pgm> {
             state: self.sponge_state,
         });
         self.sponge_state = tip5_trace.last().unwrap().to_owned();
-        let co_processor_trace = Some(Tip5Trace(SpongeAbsorb, Box::new(tip5_trace)));
+
+        let mut co_processor_calls =
+            self.underflow_io_sequence_to_co_processor_calls(all_underflow_io);
+        co_processor_calls.push(Tip5Trace(SpongeAbsorb, Box::new(tip5_trace)));
 
         self.instruction_pointer += 1;
-        Ok(co_processor_trace)
+        Ok(co_processor_calls)
     }
 
-    fn sponge_squeeze(&mut self) -> Result<Option<CoProcessorCall>> {
-        let _ = self.op_stack.pop_multiple::<{ tip5::RATE }>()?;
+    fn sponge_squeeze(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (_, underflow_io) = self.op_stack.pop_multiple::<{ tip5::RATE }>()?;
+        let mut all_underflow_io = underflow_io.to_vec();
         for i in (0..tip5::RATE).rev() {
-            self.op_stack.push(self.sponge_state[i]);
+            let underflow_io = self.op_stack.push(self.sponge_state[i]);
+            all_underflow_io.push(underflow_io);
         }
         let tip5_trace = Tip5::trace(&mut Tip5State {
             state: self.sponge_state,
         });
         self.sponge_state = tip5_trace.last().unwrap().to_owned();
-        let co_processor_trace = Some(Tip5Trace(SpongeSqueeze, Box::new(tip5_trace)));
+
+        let mut co_processor_calls =
+            self.underflow_io_sequence_to_co_processor_calls(all_underflow_io);
+        co_processor_calls.push(Tip5Trace(SpongeSqueeze, Box::new(tip5_trace)));
 
         self.instruction_pointer += 1;
-        Ok(co_processor_trace)
+        Ok(co_processor_calls)
     }
 
-    fn divine_sibling(&mut self) -> Result<Option<CoProcessorCall>> {
+    fn divine_sibling(&mut self) -> Result<Vec<CoProcessorCall>> {
         let _st0_through_st4 = self.op_stack.pop_multiple::<{ DIGEST_LENGTH }>()?;
-        let known_digest = self.op_stack.pop_multiple()?;
+        let (known_digest, _) = self.op_stack.pop_multiple()?;
+        let (node_index, _) = self.op_stack.pop_u32()?;
 
-        let node_index = self.op_stack.pop_u32()?;
         let parent_node_index = node_index / 2;
-        self.op_stack.push(parent_node_index.into());
+        let _ = self.op_stack.push(parent_node_index.into());
 
         let sibling_digest = self.pop_secret_digest()?;
         let (left_digest, right_digest) =
             Self::put_known_digest_on_correct_side(node_index, known_digest, sibling_digest);
 
         for &digest_element in right_digest.iter().rev() {
-            self.op_stack.push(digest_element);
+            let _ = self.op_stack.push(digest_element);
         }
         for &digest_element in left_digest.iter().rev() {
-            self.op_stack.push(digest_element);
+            let _ = self.op_stack.push(digest_element);
         }
+
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(vec![])
     }
 
-    fn assert_vector(&mut self) -> Result<Option<CoProcessorCall>> {
+    fn assert_vector(&mut self) -> Result<Vec<CoProcessorCall>> {
         for index in 0..DIGEST_LENGTH {
             let lhs = index.try_into().unwrap();
             let rhs = (index + DIGEST_LENGTH).try_into().unwrap();
@@ -458,7 +516,7 @@ impl<'pgm> VMState<'pgm> {
             }
         }
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(vec![])
     }
 
     fn new_vector_assertion_failure(&self, failing_index_lhs: usize) -> InstructionError {
@@ -473,210 +531,253 @@ impl<'pgm> VMState<'pgm> {
         )
     }
 
-    fn add(&mut self) -> Result<Option<CoProcessorCall>> {
-        let lhs = self.op_stack.pop()?;
-        let rhs = self.op_stack.pop()?;
-        self.op_stack.push(lhs + rhs);
+    fn add(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (lhs, underflow_io_lhs) = self.op_stack.pop()?;
+        let (rhs, underflow_io_rhs) = self.op_stack.pop()?;
+        let underflow_io_result = self.op_stack.push(lhs + rhs);
+
+        let underflow_io = vec![underflow_io_lhs, underflow_io_rhs, underflow_io_result];
+        let op_stack_calls = self.underflow_io_sequence_to_co_processor_calls(underflow_io);
+
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(op_stack_calls)
     }
 
-    fn mul(&mut self) -> Result<Option<CoProcessorCall>> {
-        let lhs = self.op_stack.pop()?;
-        let rhs = self.op_stack.pop()?;
-        self.op_stack.push(lhs * rhs);
+    fn mul(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (lhs, underflow_io_lhs) = self.op_stack.pop()?;
+        let (rhs, underflow_io_rhs) = self.op_stack.pop()?;
+        let underflow_io_result = self.op_stack.push(lhs * rhs);
+
+        let underflow_io = vec![underflow_io_lhs, underflow_io_rhs, underflow_io_result];
+        let op_stack_calls = self.underflow_io_sequence_to_co_processor_calls(underflow_io);
+
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(op_stack_calls)
     }
 
-    fn invert(&mut self) -> Result<Option<CoProcessorCall>> {
-        let top_of_stack = self.op_stack.pop()?;
+    fn invert(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (top_of_stack, _) = self.op_stack.pop()?;
         if top_of_stack.is_zero() {
             bail!(InverseOfZero);
         }
-        self.op_stack.push(top_of_stack.inverse());
+        let _ = self.op_stack.push(top_of_stack.inverse());
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(vec![])
     }
 
-    fn eq(&mut self) -> Result<Option<CoProcessorCall>> {
-        let lhs = self.op_stack.pop()?;
-        let rhs = self.op_stack.pop()?;
+    fn eq(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (lhs, underflow_io_lhs) = self.op_stack.pop()?;
+        let (rhs, underflow_io_rhs) = self.op_stack.pop()?;
         let eq: u32 = (lhs == rhs).into();
-        self.op_stack.push(eq.into());
+        let underflow_io_eq = self.op_stack.push(eq.into());
+
+        let underflow_io = vec![underflow_io_lhs, underflow_io_rhs, underflow_io_eq];
+        let op_stack_calls = self.underflow_io_sequence_to_co_processor_calls(underflow_io);
+
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(op_stack_calls)
     }
 
-    fn split(&mut self) -> Result<Option<CoProcessorCall>> {
-        let top_of_stack = self.op_stack.pop()?;
+    fn split(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (top_of_stack, underflow_io_top) = self.op_stack.pop()?;
         let lo = BFieldElement::new(top_of_stack.value() & 0xffff_ffff);
         let hi = BFieldElement::new(top_of_stack.value() >> 32);
-        self.op_stack.push(hi);
-        self.op_stack.push(lo);
+        let underflow_io_hi = self.op_stack.push(hi);
+        let underflow_io_lo = self.op_stack.push(lo);
+
+        let underflow_io = vec![underflow_io_top, underflow_io_hi, underflow_io_lo];
+        let mut co_processor_calls = self.underflow_io_sequence_to_co_processor_calls(underflow_io);
 
         let u32_table_entry = U32TableEntry::new_from_base_field_element(Split, lo, hi);
-        let co_processor_trace = Some(SingleU32TableEntry(u32_table_entry));
+        let u32_call = U32Call(u32_table_entry);
+        co_processor_calls.push(u32_call);
 
         self.instruction_pointer += 1;
-        Ok(co_processor_trace)
+        Ok(co_processor_calls)
     }
 
-    fn lt(&mut self) -> Result<Option<CoProcessorCall>> {
-        let lhs = self.op_stack.pop_u32()?;
-        let rhs = self.op_stack.pop_u32()?;
+    fn lt(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (lhs, underflow_io_lhs) = self.op_stack.pop_u32()?;
+        let (rhs, underflow_io_rhs) = self.op_stack.pop_u32()?;
         let lt: u32 = (lhs < rhs).into();
-        self.op_stack.push(lt.into());
+        let underflow_io_result = self.op_stack.push(lt.into());
+
+        let underflow_io = vec![underflow_io_lhs, underflow_io_rhs, underflow_io_result];
+        let mut co_processor_calls = self.underflow_io_sequence_to_co_processor_calls(underflow_io);
 
         let u32_table_entry = U32TableEntry::new(Lt, lhs, rhs);
-        let co_processor_trace = Some(SingleU32TableEntry(u32_table_entry));
+        let u32_call = U32Call(u32_table_entry);
+        co_processor_calls.push(u32_call);
 
         self.instruction_pointer += 1;
-        Ok(co_processor_trace)
+        Ok(co_processor_calls)
     }
 
-    fn and(&mut self) -> Result<Option<CoProcessorCall>> {
-        let lhs = self.op_stack.pop_u32()?;
-        let rhs = self.op_stack.pop_u32()?;
+    fn and(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (lhs, underflow_io_lhs) = self.op_stack.pop_u32()?;
+        let (rhs, underflow_io_rhs) = self.op_stack.pop_u32()?;
         let and = lhs & rhs;
-        self.op_stack.push(and.into());
+        let underflow_io_and = self.op_stack.push(and.into());
+
+        let underflow_io = vec![underflow_io_lhs, underflow_io_rhs, underflow_io_and];
+        let mut co_processor_calls = self.underflow_io_sequence_to_co_processor_calls(underflow_io);
 
         let u32_table_entry = U32TableEntry::new(And, lhs, rhs);
-        let co_processor_trace = Some(SingleU32TableEntry(u32_table_entry));
+        let u32_call = U32Call(u32_table_entry);
+        co_processor_calls.push(u32_call);
 
         self.instruction_pointer += 1;
-        Ok(co_processor_trace)
+        Ok(co_processor_calls)
     }
 
-    fn xor(&mut self) -> Result<Option<CoProcessorCall>> {
-        let lhs = self.op_stack.pop_u32()?;
-        let rhs = self.op_stack.pop_u32()?;
+    fn xor(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (lhs, underflow_io_lhs) = self.op_stack.pop_u32()?;
+        let (rhs, underflow_io_rhs) = self.op_stack.pop_u32()?;
         let xor = lhs ^ rhs;
-        self.op_stack.push(xor.into());
+        let underflow_io_xor = self.op_stack.push(xor.into());
+
+        let underflow_io = vec![underflow_io_lhs, underflow_io_rhs, underflow_io_xor];
+        let mut co_processor_calls = self.underflow_io_sequence_to_co_processor_calls(underflow_io);
 
         // Triton VM uses the following equality to compute the results of both the `and`
         // and `xor` instruction using the u32 coprocessor's `and` capability:
         // a ^ b = a + b - 2 · (a & b)
         let u32_table_entry = U32TableEntry::new(And, lhs, rhs);
-        let co_processor_trace = Some(SingleU32TableEntry(u32_table_entry));
+        let u32_call = U32Call(u32_table_entry);
+        co_processor_calls.push(u32_call);
 
         self.instruction_pointer += 1;
-        Ok(co_processor_trace)
+        Ok(co_processor_calls)
     }
 
-    fn log_2_floor(&mut self) -> Result<Option<CoProcessorCall>> {
-        let top_of_stack = self.op_stack.pop_u32()?;
+    fn log_2_floor(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (top_of_stack, underflow_io_top) = self.op_stack.pop_u32()?;
         if top_of_stack.is_zero() {
             bail!(LogarithmOfZero);
         }
         let log_2_floor = top_of_stack.ilog2();
-        self.op_stack.push(log_2_floor.into());
+        let underflow_io_result = self.op_stack.push(log_2_floor.into());
+
+        let underflow_io = vec![underflow_io_top, underflow_io_result];
+        let mut co_processor_calls = self.underflow_io_sequence_to_co_processor_calls(underflow_io);
 
         let u32_table_entry = U32TableEntry::new(Log2Floor, top_of_stack, 0);
-        let co_processor_trace = Some(SingleU32TableEntry(u32_table_entry));
+        let u32_call = U32Call(u32_table_entry);
+        co_processor_calls.push(u32_call);
 
         self.instruction_pointer += 1;
-        Ok(co_processor_trace)
+        Ok(co_processor_calls)
     }
 
-    fn pow(&mut self) -> Result<Option<CoProcessorCall>> {
-        let base = self.op_stack.pop()?;
-        let exponent = self.op_stack.pop_u32()?;
+    fn pow(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (base, underflow_io_base) = self.op_stack.pop()?;
+        let (exponent, underflow_io_exp) = self.op_stack.pop_u32()?;
         let base_pow_exponent = base.mod_pow(exponent.into());
-        self.op_stack.push(base_pow_exponent);
+        let underflow_io_result = self.op_stack.push(base_pow_exponent);
+
+        let underflow_io = vec![underflow_io_base, underflow_io_exp, underflow_io_result];
+        let mut co_processor_calls = self.underflow_io_sequence_to_co_processor_calls(underflow_io);
 
         let u32_table_entry =
             U32TableEntry::new_from_base_field_element(Pow, base, exponent.into());
-        let co_processor_trace = Some(SingleU32TableEntry(u32_table_entry));
+        let u32_call = U32Call(u32_table_entry);
+        co_processor_calls.push(u32_call);
 
         self.instruction_pointer += 1;
-        Ok(co_processor_trace)
+        Ok(co_processor_calls)
     }
 
-    fn div_mod(&mut self) -> Result<Option<CoProcessorCall>> {
-        let numerator = self.op_stack.pop_u32()?;
-        let denominator = self.op_stack.pop_u32()?;
+    fn div_mod(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (numerator, _) = self.op_stack.pop_u32()?;
+        let (denominator, _) = self.op_stack.pop_u32()?;
         if denominator.is_zero() {
             bail!(DivisionByZero);
         }
         let quotient = numerator / denominator;
         let remainder = numerator % denominator;
 
-        self.op_stack.push(quotient.into());
-        self.op_stack.push(remainder.into());
+        let _ = self.op_stack.push(quotient.into());
+        let _ = self.op_stack.push(remainder.into());
 
         let remainder_is_less_than_denominator = U32TableEntry::new(Lt, remainder, denominator);
         let numerator_and_quotient_range_check = U32TableEntry::new(Split, numerator, quotient);
-        let co_processor_trace = Some(DoubleU32TableEntry([
-            remainder_is_less_than_denominator,
-            numerator_and_quotient_range_check,
-        ]));
+        let co_processor_calls = vec![
+            U32Call(remainder_is_less_than_denominator),
+            U32Call(numerator_and_quotient_range_check),
+        ];
 
         self.instruction_pointer += 1;
-        Ok(co_processor_trace)
+        Ok(co_processor_calls)
     }
 
-    fn pop_count(&mut self) -> Result<Option<CoProcessorCall>> {
-        let top_of_stack = self.op_stack.pop_u32()?;
+    fn pop_count(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (top_of_stack, _) = self.op_stack.pop_u32()?;
         let pop_count = top_of_stack.count_ones();
-        self.op_stack.push(pop_count.into());
+        let _ = self.op_stack.push(pop_count.into());
 
         let u32_table_entry = U32TableEntry::new(PopCount, top_of_stack, 0);
-        let co_processor_trace = Some(SingleU32TableEntry(u32_table_entry));
+        let co_processor_calls = vec![U32Call(u32_table_entry)];
 
         self.instruction_pointer += 1;
-        Ok(co_processor_trace)
+        Ok(co_processor_calls)
     }
 
-    fn xx_add(&mut self) -> Result<Option<CoProcessorCall>> {
-        let lhs = self.op_stack.pop_extension_field_element()?;
+    fn xx_add(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (lhs, _) = self.op_stack.pop_extension_field_element()?;
         let rhs = self.op_stack.peek_at_top_extension_field_element();
-        self.op_stack.push_extension_field_element(lhs + rhs);
+        let _ = self.op_stack.push_extension_field_element(lhs + rhs);
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(vec![])
     }
 
-    fn xx_mul(&mut self) -> Result<Option<CoProcessorCall>> {
-        let lhs = self.op_stack.pop_extension_field_element()?;
+    fn xx_mul(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (lhs, _) = self.op_stack.pop_extension_field_element()?;
         let rhs = self.op_stack.peek_at_top_extension_field_element();
-        self.op_stack.push_extension_field_element(lhs * rhs);
+        let _ = self.op_stack.push_extension_field_element(lhs * rhs);
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(vec![])
     }
 
-    fn x_invert(&mut self) -> Result<Option<CoProcessorCall>> {
-        let top_of_stack = self.op_stack.pop_extension_field_element()?;
+    fn x_invert(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (top_of_stack, _) = self.op_stack.pop_extension_field_element()?;
         if top_of_stack.is_zero() {
             bail!(InverseOfZero);
         }
-        self.op_stack
-            .push_extension_field_element(top_of_stack.inverse());
+        let inverse = top_of_stack.inverse();
+        let _ = self.op_stack.push_extension_field_element(inverse);
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(vec![])
     }
 
-    fn xb_mul(&mut self) -> Result<Option<CoProcessorCall>> {
-        let lhs = self.op_stack.pop()?;
-        let rhs = self.op_stack.pop_extension_field_element()?;
-        self.op_stack.push_extension_field_element(lhs.lift() * rhs);
+    fn xb_mul(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (lhs, underflow_io) = self.op_stack.pop()?;
+        let (rhs, _) = self.op_stack.pop_extension_field_element()?;
+        let _ = self.op_stack.push_extension_field_element(lhs.lift() * rhs);
+
+        let op_stack_calls = self.underflow_io_sequence_to_co_processor_calls(vec![underflow_io]);
+
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(op_stack_calls)
     }
 
-    fn write_io(&mut self) -> Result<Option<CoProcessorCall>> {
-        let top_of_stack = self.op_stack.pop()?;
+    fn write_io(&mut self) -> Result<Vec<CoProcessorCall>> {
+        let (top_of_stack, underflow_io) = self.op_stack.pop()?;
         self.public_output.push(top_of_stack);
+
+        let op_stack_calls = self.underflow_io_sequence_to_co_processor_calls(vec![underflow_io]);
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(op_stack_calls)
     }
 
-    fn read_io(&mut self) -> Result<Option<CoProcessorCall>> {
+    fn read_io(&mut self) -> Result<Vec<CoProcessorCall>> {
         let read_element = self.public_input.pop_front().ok_or(anyhow!(
             "Instruction `read_io`: public input buffer is empty."
         ))?;
-        self.op_stack.push(read_element);
+        let underflow_io = self.op_stack.push(read_element);
+
+        let op_stack_calls = self.underflow_io_sequence_to_co_processor_calls(vec![underflow_io]);
         self.instruction_pointer += 1;
-        Ok(None)
+        Ok(op_stack_calls)
     }
 
     pub fn to_processor_row(&self) -> Array1<BFieldElement> {
@@ -686,7 +787,7 @@ impl<'pgm> VMState<'pgm> {
 
         let current_instruction = self.current_instruction().unwrap_or(Nop);
         let helper_variables = self.derive_helper_variables();
-        let ram_pointer = self.ramp.into();
+        let ram_pointer = self.ram_pointer.into();
 
         processor_row[CLK.base_table_index()] = (self.cycle_count as u64).into();
         processor_row[PreviousInstruction.base_table_index()] = self.previous_instruction;
@@ -720,8 +821,7 @@ impl<'pgm> VMState<'pgm> {
         processor_row[ST13.base_table_index()] = self.op_stack.peek_at(OpStackElement::ST13);
         processor_row[ST14.base_table_index()] = self.op_stack.peek_at(OpStackElement::ST14);
         processor_row[ST15.base_table_index()] = self.op_stack.peek_at(OpStackElement::ST15);
-        processor_row[OSP.base_table_index()] = self.op_stack.op_stack_pointer();
-        processor_row[OSV.base_table_index()] = self.op_stack.op_stack_value();
+        processor_row[OpStackPointer.base_table_index()] = self.op_stack.pointer();
         processor_row[HV0.base_table_index()] = helper_variables[0];
         processor_row[HV1.base_table_index()] = helper_variables[1];
         processor_row[HV2.base_table_index()] = helper_variables[2];