Skip to content

Commit 30dba30

Browse files
jakelishmanElePT
authored andcommitted
Rebalance CircuitInstruction and PackedInstruction (Qiskit#12730)
* Rebalance `CircuitInstruction` and `PackedInstruction` This is a large overhaul of how circuit instructions are both stored in Rust (`PackedInstruction`) and how they are presented to Python (`CircuitInstruction`). In summary: * The old `OperationType` enum is now collapsed into a manually managed `PackedOperation`. This is logically equivalent, but stores a `PyGate`/`PyInstruction`/`PyOperation` indirectly through a boxed pointer, and stores a `StandardGate` inline. As we expect the vast majority of gates to be standard, this hugely reduces the memory usage. The enumeration is manually compressed to a single pointer, hiding the discriminant in the low, alignment-required bytes of the pointer. * `PackedOperation::view()` unpacks the operation into a proper reference-like enumeration `OperationRef<'a>`, which implements `Operation` (though there is also a `try_standard_gate` method to get the gate without unpacking the whole enumeration). * Both `PackedInstruction` and `CircuitInstruction` use this `PackedOperation` as the operation storage. * `PackedInstruction` is now completely the Rust-space format for data, and `CircuitInstruction` is purely for communication with Python. On my machine, this commit brings the utility-scale benchmarks to within 10% of the runtime of 1.1.0 (and some to parity), despite all the additional overhead. Changes to accepting and building Python objects ------------------------------------------------ * A `PackedInstruction` is created by copy constructor from a `CircuitInstruction` by `CircuitData::pack`. There is no `pack_owned` (really, there never was - the previous method didn't take ownership) because there's never owned `CircuitInstruction`s coming in; they're Python-space interop, so we never own them (unless we clone them) other than when we're unpacking them. * `PackedInstruction` is currently just created manually when not coming from a `CircuitInstruction`. It's not hard, and makes it easier to re-use known intern indices than to waste time re-interning them. There is no need to go via `CircuitInstruction`. * `CircuitInstruction` now has two separated Python-space constructors: the old one, which is the default and takes `(operation, qubits, clbits)` (and extracts the information), and a new fast-path `from_standard` which asks only for the standard gate, qubits and params, avoiding operator construction. * To accept a Python-space operation, extract a Python object to `OperationFromPython`. This extracts the components that are separate in Rust space, but joined in Python space (the operation, params and extra attributes). This replaces `OperationInput` and `OperationTypeConstruct`, being more efficient at the extraction, including providing the data in the formats needed for `PackedInstruction` or `CircuitInstruction`. * To retrieve the Python-space operation, use `CircuitInstruction::get_operation` or `PackedInstruction::unpack_py_op` as appropriate. Both will cache and reuse the op, if `cache_pygates` is active. (Though note that if the op is created by `CircuitInstruction`, it will not propagate back to a `PackedInstruction`.) Avoiding operation creation --------------------------- The `_raw_op` field of `CircuitInstruction` is gone, because `PyGate`, `PyInstruction` and `PyOperation` are no longer pyclasses and no longer exposed to Python. Instead, we avoid operation creation by: * having an internal `DAGNode::_to_circuit_instruction`, which returns a copy of the internal `CircuitInstruction`, which can then be used with `CircuitInstruction.replace`, etc. * having `CircuitInstruction::is_standard_gate` to query from Python space if we should bother to create the operator. * changing `CircuitData::map_ops` to `map_nonstandard_ops`, and having it only call the Python callback function if the operation is not an unconditional standard gate. Memory usage ------------ Given the very simple example construction script: ```python from qiskit.circuit import QuantumCircuit qc = QuantumCircuit(1_000) for _ in range(3_000): for q in qc.qubits: qc.rz(0.0, q) for q in qc.qubits: qc.rx(0.0, q) for q in qc.qubits: qc.rz(0.0, q) for a, b in zip(qc.qubits[:-1], qc.qubits[1:]): qc.cx(a, b) ``` This uses 1.5GB in max resident set size on my Macbook (note that it's about 12 million gates) on both 1.1.0 and with this commit, so we've undone our memory losses. The parent of this commit uses 2GB. However, we're in a strong position to beat 1.1.0 in the future now; there are two obvious large remaining costs: * There are 16 bytes per `PackedInstruction` for the Python-operation caching (worth about 180MB in this benchmark, since no Python operations are actually created). * There is also significant memory wastage in the current `SmallVec<[Param; 3]>` storage of the parameters; for all standard gates, we know statically how many parameters are / should be stored, and we never need to increase the capacity. Further, the `Param` enum is 16 bytes wide per parameter, of which nearly 8 bytes is padding, but for all our current use cases, we only care if _all_ the parameters or floats (for everything else, we're going to have to defer to Python). We could move the discriminant out to the level of the parameters structure, and save a large amount of padding. Further work ------------ There's still performance left on the table here: * We still copy-in and copy-out of `CircuitInstruction` too much right now; we might want to make all the `CircuitInstruction` fields nullable and have `CircuitData::append` take them by _move_ rather than by copy. * The qubits/clbits interner requires owned arrays going in, but most interning should return an existing entry. We probably want to switch to have the interner take references/iterators by default, and clone when necessary. We could have a small circuit optimisation where the intern contexts reserve the first n entries to use for an all-to-all connectivity interning for up to (say) 8 qubits, since the transpiler will want to create a lot of ephemeral small circuits. * The `Param` vectors are too heavy at the moment; `SmallVec<[Param; 3]>` is 56 bytes wide, despite the vast majority of gates we care about having at most one single float (8 bytes). Dead padding is a large chunk of the memory use currently. * Fix clippy in no-gate-cache mode * Fix pylint unused-import complaints * Fix broken assumptions around the gate model The `compose` test had a now-broken assumption, because the Python-space `is` check is no longer expected to return an identical object when a standard gate is moved from one circuit to another and has its components remapped as part of the `compose` operation. This doesn't constitute the unpleasant deep-copy that that test is preventing. A custom gate still satisfies that, however, so we can just change the test. `DAGNode::set_name` could cause problems if it was called for the first time on a `CircuitInstruction` that was for a standard gate; these would be created as immutable instances. Given the changes in operator extraction to Rust space, it can now be the case that a standard gate that comes in as mutable is unpacked into Rust space, the cache is some time later invalidated, and then the operation is recreated immutably. * Fix lint * Fix minor documentation
1 parent 1044091 commit 30dba30

25 files changed

+1371
-1473
lines changed

Cargo.lock

+3-2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

+1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ license = "Apache-2.0"
1414
#
1515
# Each crate can add on specific features freely as it inherits.
1616
[workspace.dependencies]
17+
bytemuck = "1.16"
1718
indexmap.version = "2.2.6"
1819
hashbrown.version = "0.14.0"
1920
num-complex = "0.4"

crates/accelerate/src/convert_2q_block_matrix.rs

+30-45
Original file line numberDiff line numberDiff line change
@@ -23,33 +23,32 @@ use numpy::{IntoPyArray, PyArray2, PyReadonlyArray2};
2323
use smallvec::SmallVec;
2424

2525
use qiskit_circuit::bit_data::BitData;
26-
use qiskit_circuit::circuit_instruction::{operation_type_to_py, CircuitInstruction};
26+
use qiskit_circuit::circuit_instruction::CircuitInstruction;
2727
use qiskit_circuit::dag_node::DAGOpNode;
2828
use qiskit_circuit::gate_matrix::ONE_QUBIT_IDENTITY;
2929
use qiskit_circuit::imports::QI_OPERATOR;
30-
use qiskit_circuit::operations::{Operation, OperationType};
30+
use qiskit_circuit::operations::{Operation, OperationRef};
3131

3232
use crate::QiskitError;
3333

3434
fn get_matrix_from_inst<'py>(
3535
py: Python<'py>,
3636
inst: &'py CircuitInstruction,
3737
) -> PyResult<Array2<Complex64>> {
38-
match inst.operation.matrix(&inst.params) {
39-
Some(mat) => Ok(mat),
40-
None => match inst.operation {
41-
OperationType::Standard(_) => Err(QiskitError::new_err(
42-
"Parameterized gates can't be consolidated",
43-
)),
44-
OperationType::Gate(_) => Ok(QI_OPERATOR
45-
.get_bound(py)
46-
.call1((operation_type_to_py(py, inst)?,))?
47-
.getattr(intern!(py, "data"))?
48-
.extract::<PyReadonlyArray2<Complex64>>()?
49-
.as_array()
50-
.to_owned()),
51-
_ => unreachable!("Only called for unitary ops"),
52-
},
38+
if let Some(mat) = inst.op().matrix(&inst.params) {
39+
Ok(mat)
40+
} else if inst.operation.try_standard_gate().is_some() {
41+
Err(QiskitError::new_err(
42+
"Parameterized gates can't be consolidated",
43+
))
44+
} else {
45+
Ok(QI_OPERATOR
46+
.get_bound(py)
47+
.call1((inst.get_operation(py)?,))?
48+
.getattr(intern!(py, "data"))?
49+
.extract::<PyReadonlyArray2<Complex64>>()?
50+
.as_array()
51+
.to_owned())
5352
}
5453
}
5554

@@ -127,34 +126,20 @@ pub fn change_basis(matrix: ArrayView2<Complex64>) -> Array2<Complex64> {
127126

128127
#[pyfunction]
129128
pub fn collect_2q_blocks_filter(node: &Bound<PyAny>) -> Option<bool> {
130-
match node.downcast::<DAGOpNode>() {
131-
Ok(bound_node) => {
132-
let node = bound_node.borrow();
133-
match &node.instruction.operation {
134-
OperationType::Standard(gate) => Some(
135-
gate.num_qubits() <= 2
136-
&& node
137-
.instruction
138-
.extra_attrs
139-
.as_ref()
140-
.and_then(|attrs| attrs.condition.as_ref())
141-
.is_none()
142-
&& !node.is_parameterized(),
143-
),
144-
OperationType::Gate(gate) => Some(
145-
gate.num_qubits() <= 2
146-
&& node
147-
.instruction
148-
.extra_attrs
149-
.as_ref()
150-
.and_then(|attrs| attrs.condition.as_ref())
151-
.is_none()
152-
&& !node.is_parameterized(),
153-
),
154-
_ => Some(false),
155-
}
156-
}
157-
Err(_) => None,
129+
let Ok(node) = node.downcast::<DAGOpNode>() else { return None };
130+
let node = node.borrow();
131+
match node.instruction.op() {
132+
gate @ (OperationRef::Standard(_) | OperationRef::Gate(_)) => Some(
133+
gate.num_qubits() <= 2
134+
&& node
135+
.instruction
136+
.extra_attrs
137+
.as_ref()
138+
.and_then(|attrs| attrs.condition.as_ref())
139+
.is_none()
140+
&& !node.is_parameterized(),
141+
),
142+
_ => Some(false),
158143
}
159144
}
160145

crates/accelerate/src/euler_one_qubit_decomposer.rs

+12-21
Original file line numberDiff line numberDiff line change
@@ -743,7 +743,7 @@ pub fn compute_error_list(
743743
.iter()
744744
.map(|node| {
745745
(
746-
node.instruction.operation.name().to_string(),
746+
node.instruction.op().name().to_string(),
747747
smallvec![], // Params not needed in this path
748748
)
749749
})
@@ -988,11 +988,10 @@ pub fn optimize_1q_gates_decomposition(
988988
.iter()
989989
.map(|node| {
990990
if let Some(err_map) = error_map {
991-
error *=
992-
compute_error_term(node.instruction.operation.name(), err_map, qubit)
991+
error *= compute_error_term(node.instruction.op().name(), err_map, qubit)
993992
}
994993
node.instruction
995-
.operation
994+
.op()
996995
.matrix(&node.instruction.params)
997996
.expect("No matrix defined for operation")
998997
})
@@ -1046,24 +1045,16 @@ fn matmul_1q(operator: &mut [[Complex64; 2]; 2], other: Array2<Complex64>) {
10461045

10471046
#[pyfunction]
10481047
pub fn collect_1q_runs_filter(node: &Bound<PyAny>) -> bool {
1049-
let op_node = node.downcast::<DAGOpNode>();
1050-
match op_node {
1051-
Ok(bound_node) => {
1052-
let node = bound_node.borrow();
1053-
node.instruction.operation.num_qubits() == 1
1054-
&& node.instruction.operation.num_clbits() == 0
1055-
&& node
1056-
.instruction
1057-
.operation
1058-
.matrix(&node.instruction.params)
1059-
.is_some()
1060-
&& match &node.instruction.extra_attrs {
1061-
None => true,
1062-
Some(attrs) => attrs.condition.is_none(),
1063-
}
1048+
let Ok(node) = node.downcast::<DAGOpNode>() else { return false };
1049+
let node = node.borrow();
1050+
let op = node.instruction.op();
1051+
op.num_qubits() == 1
1052+
&& op.num_clbits() == 0
1053+
&& op.matrix(&node.instruction.params).is_some()
1054+
&& match &node.instruction.extra_attrs {
1055+
None => true,
1056+
Some(attrs) => attrs.condition.is_none(),
10641057
}
1065-
Err(_) => false,
1066-
}
10671058
}
10681059

10691060
#[pymodule]

crates/circuit/Cargo.toml

+1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ name = "qiskit_circuit"
1010
doctest = false
1111

1212
[dependencies]
13+
bytemuck.workspace = true
1314
hashbrown.workspace = true
1415
num-complex.workspace = true
1516
ndarray.workspace = true

0 commit comments

Comments
 (0)