Skip to content

Latest commit

 

History

History
92 lines (62 loc) · 3.55 KB

RandomVariable Blockwise Numba backend.md

File metadata and controls

92 lines (62 loc) · 3.55 KB

RandomVariables/Blockwise in Numba backend

Problem

Allocating the output arrays requires knowing size/batch_shape (easy) and core_shape (harder). What should we do for the core_shape?

Options

  1. Add an argument core_shape to the Op itself

    Downside:

    1. Verbose, it's not a "true input" to most RVs, in that it can't be changed and is mostly not checked. It is a true input for timeseries RVs (nsteps), but we don't use RandomVariables for those these days anyway
    2. Makes graph representation more complicated
    3. Useless for backends that don't need it (i.e., all but Numba)

    Upside:

    1. It's part of the graph, can be inferred/constant folded if more complicated. Can be merged if shape graph shows up elsewhere.
    2. Uses the same code that is already needed to infer the static_shape/output shape (DRY).
    3. Works for Blockwise.

    Implemented in pymc-devs/pytensor#691

  2. Replace size by shape

    Downside:

    1. Same as with core_shape input.
    2. Does not allow size=None (implied size). I am not sure what this is good for though.
    3. Not a (great?) solution for Blockwise

    Upside:

    1. No extra number of inputs
    2. PyMC can pass shape directly
  3. Use a specialized Op that's introduced later and only for the backends that need it (i.e., Numba)

    Downside:

    1. May make rewrite ordering cumbersome
    2. Graph is not executable without rewrites (not a biggie for me)
    3. Works for Blockwise

    Upside:

    1. Doesn't clutter main IR
    2. Doesn't clutter backends where it is not needed
    3. Can be made arbitrarily complex without worries. Perhaps pre-allocating the output buffers at the PyTensor level like we do for Scan
  4. Wait for first eval to find out core_shape and only then allocate. This is what the Numba impl of Scan does for outputs without taps (nit-sot).

    Downside:

    1. Potentially very inneficient?

    Upside:

    1. No extra care needed at the graph representation
    2. Works for Blockwise
  5. Compile core_shape graph function at dispatch and use that.

    Downside:

    1. Avoids computation merging if shape graph was already present for something else or same graph applies to multiple Ops
    2. Makes dispatch impl more complicated

    Upside:

    1. No extra care needed at the graph representation
    2. Still uses same machinery (DRY)
    3. Works for Blockwise
  6. Don't use PyTensor machinery at all. Implement a Numba dispatch that takes inputs are arguments and returns core shape

    Downside:

    1. Avoids computation merging if shape graph was already present for something else or same graph can be used for multiple Ops
    2. Makes dispatch impl more complicated
    3. Does not provide an automatic solution for Blockwise

    Upside:

    1. No extra care needed at the graph representation

What does Numba do?

At the moment it doesn't allow guvectorize signatures with constant shapes (literal ints), or output symbols that are not present in the inputs

  1. numba/numba#6690
  2. numba/numba#2797

What others have been thinking about?

  1. Make signature a more powerful DSL or allow callables for core_shapes:
    1. numpy/numpy#18151
    2. https://github.com/WarrenWeckesser/numpy-notes/blob/main/enhancements/gufunc-size-expressions.md
    3. https://github.com/WarrenWeckesser/numpy-notes/blob/main/enhancements/gufunc-shape-only-params.md
    4. pymc-devs/pytensor#143