Allocating the output arrays requires knowing size
/batch_shape
(easy) and core_shape
(harder). What should we do for the core_shape
?
-
Add an argument
core_shape
to the Op itselfDownside:
- Verbose, it's not a "true input" to most RVs, in that it can't be changed and is mostly not checked. It is a true input for timeseries RVs (nsteps), but we don't use RandomVariables for those these days anyway
- Makes graph representation more complicated
- Useless for backends that don't need it (i.e., all but Numba)
Upside:
- It's part of the graph, can be inferred/constant folded if more complicated. Can be merged if shape graph shows up elsewhere.
- Uses the same code that is already needed to infer the static_shape/output shape (DRY).
- Works for Blockwise.
Implemented in pymc-devs/pytensor#691
-
Replace
size
byshape
Downside:
- Same as with
core_shape
input. - Does not allow
size=None
(implied size). I am not sure what this is good for though. - Not a (great?) solution for Blockwise
Upside:
- No extra number of inputs
- PyMC can pass shape directly
- Same as with
-
Use a specialized Op that's introduced later and only for the backends that need it (i.e., Numba)
Downside:
- May make rewrite ordering cumbersome
- Graph is not executable without rewrites (not a biggie for me)
- Works for Blockwise
Upside:
- Doesn't clutter main IR
- Doesn't clutter backends where it is not needed
- Can be made arbitrarily complex without worries. Perhaps pre-allocating the output buffers at the PyTensor level like we do for Scan
-
Wait for first eval to find out
core_shape
and only then allocate. This is what the Numba impl of Scan does for outputs without taps (nit-sot).Downside:
- Potentially very inneficient?
Upside:
- No extra care needed at the graph representation
- Works for Blockwise
-
Compile
core_shape
graph function at dispatch and use that.Downside:
- Avoids computation merging if shape graph was already present for something else or same graph applies to multiple Ops
- Makes dispatch impl more complicated
Upside:
- No extra care needed at the graph representation
- Still uses same machinery (DRY)
- Works for Blockwise
-
Don't use PyTensor machinery at all. Implement a Numba dispatch that takes inputs are arguments and returns core shape
Downside:
- Avoids computation merging if shape graph was already present for something else or same graph can be used for multiple Ops
- Makes dispatch impl more complicated
- Does not provide an automatic solution for Blockwise
Upside:
- No extra care needed at the graph representation
At the moment it doesn't allow guvectorize signatures with constant shapes (literal ints), or output symbols that are not present in the inputs
- Make signature a more powerful DSL or allow callables for core_shapes: