Support parallel computation in cost functions #718

HDembinski · 2022-03-03T13:53:13Z

People often ask whether iminuit/MINUIT supports parallel computation. MINUITs C++ code does not support it, since the minimization is fundamentally sequential and cannot be parallelized in a straight-forward way.

However, it is comparably easy to calculate the result of the cost function in parallel on several cores. This can be handled entirely on the Python side.

There are two ways to parallelize if you have a single fit.

Parallelize the model.
Parallelize the cost function.

Parallelize the model

Users of the builtin cost functions need to provide a model pdf or cdf, which is expected to be a vectorised function. Vectorised functions are already embarrassingly parallel, so in theory it might be enough to decorate the model with @numba.njit(parallel=True). Users need to check, however, that they function body is actually parallelizable. Numba does not generate a diagnostic if this does not work, unless special steps are taken.

import numpy as np
import numba as nb
from iminuit.cost import UnbinnedNLL

@nb.njit(parallel=True)
def model_pdf(x, mu, sigma):
    z = (x - mu) / sigma
    return np.exp(-0.5 * z * z - np.log(np.sqrt(2 * np.pi) * sigma))

rng = np.random.default_rng(1)
x = rng.normal(size=1_000_000)

cost = UnbinnedNLL(x, model_pdf)

m = Minuit(cost, mu=0, sigma=1)
m.migrad()

Parallelize the cost function

The drawback of the previous option is that it requires a parallel implementation of the model pdf/cdf. This limits the use of library-provided pdfs, for example, numba_stats only provides non-parallel versions at the moment. An alternative is to implement the parallelization in the cost function, which is also embarrassingly parallel.This approach has obvious advantages:

It works with any non-parallel pdf
Parallelization could be turned on/off with keyword passed to the cost function

import numpy as np
import numba as nb
from numba_stats import norm

@nb.njit(parallel=True)
def cost_impl(x, mu, sigma):
     r = np.empty(8)  # number of parallel threads
     chunk = len(x) // len(r)
     for i in nb.prange(len(r)):
         xi = x[i * chunk : (i+1) * chunk]
         r[i] = np.sum(norm.logpdf(xi, mu, sigma))
     return -np.sum(r)

rng = np.random.default_rng(1)
x = rng.normal(size=1_000_000)

def cost(mu, sigma):
     return cost_impl(x, mu, sigma)
cost.errordef = Minuit.LIKELIHOOD

m = Minuit(cost, mu=0, sigma=1)
m.migrad()

In theory, both approaches should work equally well. The chunk-size can be optimized to match the size of the CPU cache in the second case.

The text was updated successfully, but these errors were encountered:

HDembinski · 2022-03-27T16:19:45Z

As shown in the new study, the best results are obtained when both the model and the cost function are parallelized.

HDembinski added the enhancement label Jul 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support parallel computation in cost functions #718

Support parallel computation in cost functions #718

HDembinski commented Mar 3, 2022

HDembinski commented Mar 27, 2022

Support parallel computation in cost functions #718

Support parallel computation in cost functions #718

Comments

HDembinski commented Mar 3, 2022

Parallelize the model

Parallelize the cost function

HDembinski commented Mar 27, 2022