-
Notifications
You must be signed in to change notification settings - Fork 83
Description
People often ask whether iminuit/MINUIT supports parallel computation. MINUITs C++ code does not support it, since the minimization is fundamentally sequential and cannot be parallelized in a straight-forward way.
However, it is comparably easy to calculate the result of the cost function in parallel on several cores. This can be handled entirely on the Python side.
There are two ways to parallelize if you have a single fit.
- Parallelize the model.
- Parallelize the cost function.
Parallelize the model
Users of the builtin cost functions need to provide a model pdf or cdf, which is expected to be a vectorised function. Vectorised functions are already embarrassingly parallel, so in theory it might be enough to decorate the model with @numba.njit(parallel=True)
. Users need to check, however, that they function body is actually parallelizable. Numba does not generate a diagnostic if this does not work, unless special steps are taken.
import numpy as np
import numba as nb
from iminuit.cost import UnbinnedNLL
@nb.njit(parallel=True)
def model_pdf(x, mu, sigma):
z = (x - mu) / sigma
return np.exp(-0.5 * z * z - np.log(np.sqrt(2 * np.pi) * sigma))
rng = np.random.default_rng(1)
x = rng.normal(size=1_000_000)
cost = UnbinnedNLL(x, model_pdf)
m = Minuit(cost, mu=0, sigma=1)
m.migrad()
Parallelize the cost function
The drawback of the previous option is that it requires a parallel implementation of the model pdf/cdf. This limits the use of library-provided pdfs, for example, numba_stats
only provides non-parallel versions at the moment. An alternative is to implement the parallelization in the cost function, which is also embarrassingly parallel.This approach has obvious advantages:
- It works with any non-parallel pdf
- Parallelization could be turned on/off with keyword passed to the cost function
import numpy as np
import numba as nb
from numba_stats import norm
@nb.njit(parallel=True)
def cost_impl(x, mu, sigma):
r = np.empty(8) # number of parallel threads
chunk = len(x) // len(r)
for i in nb.prange(len(r)):
xi = x[i * chunk : (i+1) * chunk]
r[i] = np.sum(norm.logpdf(xi, mu, sigma))
return -np.sum(r)
rng = np.random.default_rng(1)
x = rng.normal(size=1_000_000)
def cost(mu, sigma):
return cost_impl(x, mu, sigma)
cost.errordef = Minuit.LIKELIHOOD
m = Minuit(cost, mu=0, sigma=1)
m.migrad()
In theory, both approaches should work equally well. The chunk-size can be optimized to match the size of the CPU cache in the second case.