Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerically stable computation of invariant mass #188

Open
HDembinski opened this issue Apr 26, 2022 · 1 comment
Open

Numerically stable computation of invariant mass #188

HDembinski opened this issue Apr 26, 2022 · 1 comment

Comments

@HDembinski
Copy link
Member

I recently developed a formula to compute the invariant mass for a two-body decay in a numerically stable way from the momenta of the daughters and their assumed masses. The algorithm is described and benchmarked here:
https://github.com/HDembinski/essays/blob/master/Numerically%20stable%20calculation%20of%20invariant%20mass.ipynb
The numerical instability arises from the subtractive cancellation in this formula

image

for m1,m2 << |p1|, |p2|.

I am not sure whether this fits in here, but I wanted to advertise it. As you can see in my notebook, my formula improves the accuracy dramatically over the naive formula. It further allows one to perform the whole calculation in single precision instead of double precision, which frees CPU registers and should accelerate code that benefits from SIMD instructions. Single precision is sufficiently accurate to describe the resolution of experimental data, we only habitually convert to double precision because we usually do not use numerically stable formulas.

@jpivarski
Copy link
Member

That would be great to add! Worse than not being numerically stable, Vector doesn't yet have a function for computing the invariant mass of two 4-vectors independently of the other three components. That is, users are currently computing

(vector1 + vector2).mass

which computes all four components of vector1 + vector2, uses them in the computation of mass, and then throws away the temporaries. We ought to have a function that just computes invariant mass and nothing else, but we don't (yet). I had been thinking of adding this formula:

mass = sqrt(2*pt1*pt2*(cosh(eta1 - eta2) - cos(phi1 - phi2)))

but that has the numerically unstable subtraction (as well as an approximation that the mass components of vector1 and vector2 are zero). It is the case that sqrt sometimes encounters negative values due to round-off error, and I see that your implementation is guaranteed to not produce negative values.


Adding a function to Vector is a bit of a process. It supports many backends by duck-typing: the source code in the compute submodule is generic and takes lib as an argument, where lib may be numpy or some other library with the same functions. This test acts as a gatekeeper, warning us if any compute functions use features we haven't yet determined are "safe." (It's a whitelist that we can add to.)

These functions are currently being used on Python scalars, NumPy arrays, Awkward Arrays, and in Numba (all of which use lib = numpy, though Python objects could have used lib = math), but the idea is that we should be able to extend to CuPy (lib = cupy), autodifferentiable JAX (lib = jax.numpy), etc. I think a tour de force would be to show that it can even be used on symbolic math (lib = sympy or a SymPy wrapper) because such weak assumptions are being made.

Another dimension that scales Vector functions is that we have a set of 2 × 3 × 2 = 12 coordinate systems:

  • azimuthal coordinates may be "x, y" or "rho, phi": 2 systems
  • longitudinal coordinate may be "z" or "theta" or "eta": 3 systems
  • temporal coordinate may be "t" (Cartesian) or "tau" (proper time): 2 systems

That is, the azimuthal coordinates are independent of all others, longitudinal components depend only on the longitudinal direction and azimuthal, and temporal components depend on all three. This makes it impossible to use "r" (3D spatial distance) as a stored coordinate, but everything else that we typically want is available (e.g. proper time with any combination of Cartesian, polar, and pseudorapidity).

Unfortunately, that blows up the number of formulas that in principle need to be written in the compute submodule, especially for 4D vectors, especially for multiple inputs (an invariant mass calculation has two vector inputs). For 2D and to a lesser extent, 3D, we often write out specialized formulas when they're available in a combination of coordinate systems, but most 4D calculations either use 2D and 3D functions directly or do coordinate transformations before computing if necessary.

Here's an example: the dot function in 2D

def xy_xy(lib, x1, y1, x2, y2):
return x1 * x2 + y1 * y2
def xy_rhophi(lib, x1, y1, rho2, phi2):
return x1 * x.rhophi(lib, rho2, phi2) + y1 * y.rhophi(lib, rho2, phi2)
def rhophi_xy(lib, rho1, phi1, x2, y2):
return x.rhophi(lib, rho1, phi1) * x2 + y.rhophi(lib, rho1, phi1) * y2
def rhophi_rhophi(lib, rho1, phi1, rho2, phi2):
return rho1 * rho2 * lib.cos(phi1 - phi2)

has an explicit formula when both vectors are Cartesian or when both vectors are polar, but if one is Cartesian and the other is polar, it converts to Cartesian. Here, x and y are modules like the dot module,

from vector._compute.planar import x, y

and the x.rhophi and y.rhophi functions convert "rho, phi" coordinates into "x" and "y" respectively. The naming convention is absolutely consistent.

Methods on vectors look up compute functions through a dict, which is keyed on combinations of coordinate systems and includes signature information (this function returns float on individual vectors):

dispatch_map = {
(AzimuthalXY, AzimuthalXY): (xy_xy, float),
(AzimuthalXY, AzimuthalRhoPhi): (xy_rhophi, float),
(AzimuthalRhoPhi, AzimuthalXY): (rhophi_xy, float),
(AzimuthalRhoPhi, AzimuthalRhoPhi): (rhophi_rhophi, float),
}

Some of the 3D and 4D functions use the lookup programmatically to generate all of the coordinate system combinations. (dot doesn't.)

The 2D implementation has 4 functions because there are 2 azimuthal coordinate systems and 2 input vectors. The 3D implementation has 6 coordinate systems (raised to the power of) 2 input vectors = 36 functions.

dispatch_map = {
(AzimuthalXY, AzimuthalXY): (xy_xy, float),
(AzimuthalXY, AzimuthalRhoPhi): (xy_rhophi, float),
(AzimuthalRhoPhi, AzimuthalXY): (rhophi_xy, float),
(AzimuthalRhoPhi, AzimuthalRhoPhi): (rhophi_rhophi, float),
}

Four of these are specialized and the others are defined in terms of them.

The 4D implementation has 12 coordinate systems (raised to the power of) 2 input vectors = 144 functions. We don't actually write out those 144 functions; they're generated at startup:

def make_conversion(
azimuthal1, longitudinal1, temporal1, azimuthal2, longitudinal2, temporal2
):
spatial_dot, _ = dot.dispatch_map[
azimuthal1, longitudinal1, azimuthal2, longitudinal2
]
if azimuthal1 is AzimuthalXY:
if longitudinal1 is LongitudinalZ:
if temporal1 is TemporalT:
to_t1 = t.xy_z_t
elif temporal1 is TemporalTau:
to_t1 = t.xy_z_tau
elif longitudinal1 is LongitudinalTheta:
if temporal1 is TemporalT:
to_t1 = t.xy_theta_t
elif temporal1 is TemporalTau:
to_t1 = t.xy_theta_tau
elif longitudinal1 is LongitudinalEta:
if temporal1 is TemporalT:
to_t1 = t.xy_eta_t
elif temporal1 is TemporalTau:
to_t1 = t.xy_eta_tau
elif azimuthal1 is AzimuthalRhoPhi:
if longitudinal1 is LongitudinalZ:
if temporal1 is TemporalT:
to_t1 = t.rhophi_z_t
elif temporal1 is TemporalTau:
to_t1 = t.rhophi_z_tau
elif longitudinal1 is LongitudinalTheta:
if temporal1 is TemporalT:
to_t1 = t.rhophi_theta_t
elif temporal1 is TemporalTau:
to_t1 = t.rhophi_theta_tau
elif longitudinal1 is LongitudinalEta:
if temporal1 is TemporalT:
to_t1 = t.rhophi_eta_t
elif temporal1 is TemporalTau:
to_t1 = t.rhophi_eta_tau
if azimuthal2 is AzimuthalXY:
if longitudinal2 is LongitudinalZ:
if temporal2 is TemporalT:
to_t2 = t.xy_z_t
elif temporal2 is TemporalTau:
to_t2 = t.xy_z_tau
elif longitudinal2 is LongitudinalTheta:
if temporal2 is TemporalT:
to_t2 = t.xy_theta_t
elif temporal2 is TemporalTau:
to_t2 = t.xy_theta_tau
elif longitudinal2 is LongitudinalEta:
if temporal2 is TemporalT:
to_t2 = t.xy_eta_t
elif temporal2 is TemporalTau:
to_t2 = t.xy_eta_tau
elif azimuthal2 is AzimuthalRhoPhi:
if longitudinal2 is LongitudinalZ:
if temporal2 is TemporalT:
to_t2 = t.rhophi_z_t
elif temporal2 is TemporalTau:
to_t2 = t.rhophi_z_tau
elif longitudinal2 is LongitudinalTheta:
if temporal2 is TemporalT:
to_t2 = t.rhophi_theta_t
elif temporal2 is TemporalTau:
to_t2 = t.rhophi_theta_tau
elif longitudinal2 is LongitudinalEta:
if temporal2 is TemporalT:
to_t2 = t.rhophi_eta_t
elif temporal2 is TemporalTau:
to_t2 = t.rhophi_eta_tau
def f(lib, coord11, coord12, coord13, coord14, coord21, coord22, coord23, coord24):
return (
to_t1(lib, coord11, coord12, coord13, coord14)
* to_t2(lib, coord21, coord22, coord23, coord24)
) - spatial_dot(lib, coord11, coord12, coord13, coord21, coord22, coord23)
dispatch_map[
azimuthal1, longitudinal1, temporal1, azimuthal2, longitudinal2, temporal2
] = (f, float)
for azimuthal1 in (AzimuthalXY, AzimuthalRhoPhi):
for longitudinal1 in (LongitudinalZ, LongitudinalTheta, LongitudinalEta):
for temporal1 in (TemporalT, TemporalTau):
for azimuthal2 in (AzimuthalXY, AzimuthalRhoPhi):
for longitudinal2 in (
LongitudinalZ,
LongitudinalTheta,
LongitudinalEta,
):
for temporal2 in (TemporalT, TemporalTau):
make_conversion(
azimuthal1,
longitudinal1,
temporal1,
azimuthal2,
longitudinal2,
temporal2,
)

This loop fills the dispatch_map with functions, converting all temporal coordinates to Cartesian, but using whatever spatial formulas may be available.


That's not to say that you need to write all of that—I just wanted to make you aware of it so that I can ask, "In what coordinate systems would it be pragmatic/worthwhile to derive your numerically stable formula?"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants