Skip to content

Support parallel computation in cost functions #718

@HDembinski

Description

@HDembinski

People often ask whether iminuit/MINUIT supports parallel computation. MINUITs C++ code does not support it, since the minimization is fundamentally sequential and cannot be parallelized in a straight-forward way.

However, it is comparably easy to calculate the result of the cost function in parallel on several cores. This can be handled entirely on the Python side.

There are two ways to parallelize if you have a single fit.

  • Parallelize the model.
  • Parallelize the cost function.

Parallelize the model

Users of the builtin cost functions need to provide a model pdf or cdf, which is expected to be a vectorised function. Vectorised functions are already embarrassingly parallel, so in theory it might be enough to decorate the model with @numba.njit(parallel=True). Users need to check, however, that they function body is actually parallelizable. Numba does not generate a diagnostic if this does not work, unless special steps are taken.

import numpy as np
import numba as nb
from iminuit.cost import UnbinnedNLL

@nb.njit(parallel=True)
def model_pdf(x, mu, sigma):
    z = (x - mu) / sigma
    return np.exp(-0.5 * z * z - np.log(np.sqrt(2 * np.pi) * sigma))

rng = np.random.default_rng(1)
x = rng.normal(size=1_000_000)

cost = UnbinnedNLL(x, model_pdf)

m = Minuit(cost, mu=0, sigma=1)
m.migrad()

Parallelize the cost function

The drawback of the previous option is that it requires a parallel implementation of the model pdf/cdf. This limits the use of library-provided pdfs, for example, numba_stats only provides non-parallel versions at the moment. An alternative is to implement the parallelization in the cost function, which is also embarrassingly parallel.This approach has obvious advantages:

  • It works with any non-parallel pdf
  • Parallelization could be turned on/off with keyword passed to the cost function
import numpy as np
import numba as nb
from numba_stats import norm

@nb.njit(parallel=True)
def cost_impl(x, mu, sigma):
     r = np.empty(8)  # number of parallel threads
     chunk = len(x) // len(r)
     for i in nb.prange(len(r)):
         xi = x[i * chunk : (i+1) * chunk]
         r[i] = np.sum(norm.logpdf(xi, mu, sigma))
     return -np.sum(r)

rng = np.random.default_rng(1)
x = rng.normal(size=1_000_000)

def cost(mu, sigma):
     return cost_impl(x, mu, sigma)
cost.errordef = Minuit.LIKELIHOOD

m = Minuit(cost, mu=0, sigma=1)
m.migrad()

In theory, both approaches should work equally well. The chunk-size can be optimized to match the size of the CPU cache in the second case.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions