Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflict in unit alias #543

Open
tien-vo opened this issue Oct 31, 2024 · 14 comments
Open

Conflict in unit alias #543

tien-vo opened this issue Oct 31, 2024 · 14 comments

Comments

@tien-vo
Copy link

tien-vo commented Oct 31, 2024

Take a conversion from electron temperature to thermal velocity as an example:

from pint import application_registry as u
import numpy as np
np.sqrt(100 * u.eV / u.m_e).to("km/s")
# <Quantity(4193.82881, 'kilometer / second')>

If we then import cf_xarray.units to enable CF units, per this page, we get the following error:

import cf_xarray.units
np.sqrt(100 * u.eV / u.m_e).to("km/s")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tvo/.local/share/mamba/envs/test_xarray_units/lib/python3.12/site-packages/pint/facets/plain/quantity.py", line 536, in to
    magnitude = self._convert_magnitude_not_inplace(other, *contexts, **ctx_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tvo/.local/share/mamba/envs/test_xarray_units/lib/python3.12/site-packages/pint/facets/plain/quantity.py", line 480, in _convert_magnitude_not_inplace
    return self._REGISTRY.convert(self._magnitude, self._units, other)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tvo/.local/share/mamba/envs/test_xarray_units/lib/python3.12/site-packages/pint/facets/plain/registry.py", line 1041, in convert
    return self._convert(value, src, dst, inplace)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tvo/.local/share/mamba/envs/test_xarray_units/lib/python3.12/site-packages/pint/facets/context/registry.py", line 405, in _convert
    return super()._convert(value, src, dst, inplace)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tvo/.local/share/mamba/envs/test_xarray_units/lib/python3.12/site-packages/pint/facets/nonmultiplicative/registry.py", line 259, in _convert
    return super()._convert(value, src, dst, inplace)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tvo/.local/share/mamba/envs/test_xarray_units/lib/python3.12/site-packages/pint/facets/plain/registry.py", line 1076, in _convert
    raise factor
pint.errors.DimensionalityError: Cannot convert from 'electron_volt ** 0.5 / electron_mass ** 0.5' ([temperature] ** 0.5 * [length] / [time] ** 1.5 / [current] ** 0.5) to 'kilometer / second' ([length] / [time])
@tien-vo
Copy link
Author

tien-vo commented Nov 4, 2024

units.define("@alias degC = C = deg_C = Celsius = degrees_Celsius")

This line is the issue. 'C' is already registered as 'coulomb'. But it aliases 'C' to temperature. Can you remove the 'C' alias?, i.e.,

units.define("@alias degC = deg_C = Celsius = degrees_Celsius")

I'd be happy to submit a quick PR for this, if the removal doesn't affect anything else in the package.

@tien-vo tien-vo changed the title Quantities with fractional power do not convert properly Conflict in unit alias Nov 4, 2024
@dcherian
Copy link
Contributor

dcherian commented Nov 4, 2024

Do you need any of the other unit definitions in this file at all?

@tien-vo
Copy link
Author

tien-vo commented Nov 4, 2024

The other aliases don't affect me for now. But I'd expect 'C' to mean Coulomb, or you should mention the breaking alias(es) in the documentation.

@tien-vo
Copy link
Author

tien-vo commented Nov 4, 2024

By the way, some of the definitions in cf_xarray/units.py are no longer necessary if the pint version is bumped. These warning logs come up for pint version 0.24.3:

24-Nov-04 12:06:00 [WARNING]: Redefining 'percent' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
24-Nov-04 12:06:00 [WARNING]: Redefining '%' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
24-Nov-04 12:06:00 [WARNING]: Redefining 'year' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
24-Nov-04 12:06:00 [WARNING]: Redefining 'yr' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
24-Nov-04 12:06:00 [WARNING]: Redefining 'd' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
24-Nov-04 12:06:00 [WARNING]: Redefining 'h' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
24-Nov-04 12:06:00 [WARNING]: Redefining 'degrees_north' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
24-Nov-04 12:06:00 [WARNING]: Redefining 'degrees_east' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
24-Nov-04 12:06:00 [WARNING]: Redefining 'degrees' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
24-Nov-04 12:06:00 [WARNING]: Redefining '[speed]' (<class 'pint.delegates.txt_defparser.plain.DerivedDimensionDefinition'>)

I haven't tracked all of them, but at least the percentage is introduced in 0.21 (Add "%" preprocessor and percent and ppm units #1661). This should be a separate issue for cf-xarray, though?

@dcherian
Copy link
Contributor

dcherian commented Nov 4, 2024

Oh hmm.. yeah we can bump pint I guess.

What specific unit definitions do you need from this file?

@dcherian
Copy link
Contributor

dcherian commented Nov 4, 2024

@aulemahal can you take a look please?

@tien-vo
Copy link
Author

tien-vo commented Nov 4, 2024

@dcherian Sorry I must have not made myself clear. I need line 95 not to alias 'C' to 'degC' because 'C' already means 'coulomb'.

@dcherian
Copy link
Contributor

dcherian commented Nov 4, 2024

Right, I got that :)

I'm curious which of the other definitions are useful. Are you studying lightning? (we assumed no one would use CF units AND Coloumb when we merged that change)

@tien-vo
Copy link
Author

tien-vo commented Nov 5, 2024

Ah gotcha. I'm doing plasma physics and migrating from astropy.units to pint because the latter plays nice with xarray for now. But I need to parse unit strings following FITS standards, which is close enough to CF standards (e.g., "cm-2 s-1" parses as "1 / cm**2 / s").

If that's a deliberate choice you made, it's ok to close this issue. I'll write my own parser.

@aulemahal
Copy link
Contributor

I think removing "C" as an alias of "degC" is ok. In fact, udunits recognizes it as Coulomb, so we could argue that doing so is moving closer to the official spec.

As for "pinning" pint, I'm ok with raising the minimum version and removing the unneeded definition, but I would ensure at least on version before 0.24 works. Pint 0.24 requires numpy 2, so requiring that as a minimum would break our alignment with SPEC0.

@keewis
Copy link
Contributor

keewis commented Nov 6, 2024

Pint 0.24 requires numpy 2

maybe I'm mistaken, but doesn't pint require numpy >= 1.23 right now?

@aulemahal
Copy link
Contributor

Oh! I'm sorry, indeed the pyproject says so.

Therefore, I would 100% support the changes discussed in this issue!

@tien-vo
Copy link
Author

tien-vo commented Nov 6, 2024

@aulemahal I'll submit a PR later this week when I have time to spare. Quick question since I didn't pay attention to SPEC0 before: are you saying it is fine to bump to pint>=0.24? That version was released this summer, so wouldn't that also break SPEC0?

By the way, I found another bug. The custom regexp parser breaks units with number suffix like 'ln10' or 'K_alpha_W_d_20'. Consider the minimal snippet:

import functools
import re
import pint
units = pint.UnitRegistry()
pint.set_application_registry(units)
pint.Unit("degree").compatible_units()
# frozenset({<Unit('decade')>, <Unit('grade')>, <Unit('permille')>, <Unit('milliarcsecond')>, <Unit('fine_structure_constant')>, <Unit('zeta')>, <Unit('arcminute')>, <Unit('turn')>, <Unit('radian')>, <Unit('wien_u')>, <Unit('decibel')>, <Unit('ln10')>, <Unit('wien_x')>, <Unit('avogadro_number')>, <Unit('neper')>, <Unit('byte')>, <Unit('K_alpha_W_d_220')>, <Unit('arcsecond')>, <Unit('K_alpha_Cu_d_220')>, <Unit('degree')>, <Unit('steradian')>, <Unit('square_degree')>, <Unit('electron_g_factor')>, <Unit('count')>, <Unit('mil')>, <Unit('octave')>, <Unit('percent')>, <Unit('ppm')>, <Unit('refractive_index_unit')>, <Unit('bit')>, <Unit('tansec')>, <Unit('K_alpha_Mo_d_220')>, <Unit('eulers_number')>, <Unit('pi')>})
pint.Quantity(1.0, "degree").to("ln10")
# <Quantity(0.00757986863, 'ln10')>

But with the regex preprocessor:

functools.partial(
re.compile(
r"(?<=[A-Za-z])(?![A-Za-z])(?<![0-9\-][eE])(?<![0-9\-])(?=[0-9\-])"
).sub,
"**",
),

'10' is stripped from 'ln10' and it raises an error

units = pint.UnitRegistry(
    preprocessors=[
        functools.partial(
            re.compile(
                r"(?<=[A-Za-z])(?![A-Za-z])(?<![0-9\-][eE])(?<![0-9\-])(?=[0-9\-])"
            ).sub,
            "**",
        ),
    ],
)
pint.set_application_registry(units)
pint.Unit("degree").compatible_units()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tvo/.local/share/mamba/envs/cf-xarray/lib/python3.12/site-packages/pint/facets/plain/unit.py", line 104, in compatible_units
    return self._REGISTRY.get_compatible_units(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tvo/.local/share/mamba/envs/cf-xarray/lib/python3.12/site-packages/pint/facets/system/registry.py", line 239, in get_compatible_units
    return frozenset(self.Unit(eq) for eq in equiv)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tvo/.local/share/mamba/envs/cf-xarray/lib/python3.12/site-packages/pint/facets/system/registry.py", line 239, in <genexpr>
    return frozenset(self.Unit(eq) for eq in equiv)
                     ^^^^^^^^^^^^^
  File "/home/tvo/.local/share/mamba/envs/cf-xarray/lib/python3.12/site-packages/pint/facets/plain/unit.py", line 41, in __init__
    self._units = self._REGISTRY.parse_units(units)._units
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tvo/.local/share/mamba/envs/cf-xarray/lib/python3.12/site-packages/pint/facets/plain/registry.py", line 1202, in parse_units
    self.parse_units_as_container(input_string, as_delta, case_sensitive)
  File "/home/tvo/.local/share/mamba/envs/cf-xarray/lib/python3.12/site-packages/pint/facets/nonmultiplicative/registry.py", line 69, in parse_units_as_container
    return super().parse_units_as_container(input_string, as_delta, case_sensitive)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tvo/.local/share/mamba/envs/cf-xarray/lib/python3.12/site-packages/pint/facets/plain/registry.py", line 1217, in parse_units_as_container
    return self._parse_units_as_container(input_string, as_delta, case_sensitive)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tvo/.local/share/mamba/envs/cf-xarray/lib/python3.12/site-packages/pint/facets/plain/registry.py", line 1252, in _parse_units_as_container
    cname = self.get_name(name, case_sensitive=case_sensitive)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tvo/.local/share/mamba/envs/cf-xarray/lib/python3.12/site-packages/pint/facets/plain/registry.py", line 661, in get_name
    raise UndefinedUnitError(name_or_alias)
pint.errors.UndefinedUnitError: 'ln' is not defined in the unit registry

I'm not an expert on regex but based on this, adding excluding patterns fixes the issue, e.g.:

exclude_patterns = "|".join(["ln10", "K_alpha_Cu_d_220", "K_alpha_Mo_d_220", "K_alpha_W_d_220"])
units = pint.UnitRegistry(
    preprocessors=[
        functools.partial(
            re.compile(
                rf"^(?!{exclude_patterns})(?<=[A-Za-z])(?![A-Za-z])(?<![0-9\-][eE])(?<![0-9\-])(?=[0-9\-])"
            ).sub,
            "**",
        ),
    ],
)
pint.set_application_registry(units)
pint.Unit("degree").compatible_units()
# frozenset({<Unit('bit')>, <Unit('arcminute')>, <Unit('K_alpha_Mo_d_220')>, <Unit('turn')>, <Unit('decibel')>, <Unit('steradian')>, <Unit('pi')>, <Unit('octave')>, <Unit('arcsecond')>, <Unit('milliarcsecond')>, <Unit('decade')>, <Unit('mil')>, <Unit('ln10')>, <Unit('wien_u')>, <Unit('tansec')>, <Unit('eulers_number')>, <Unit('radian')>, <Unit('zeta')>, <Unit('grade')>, <Unit('refractive_index_unit')>, <Unit('K_alpha_W_d_220')>, <Unit('electron_g_factor')>, <Unit('degree')>, <Unit('square_degree')>, <Unit('byte')>, <Unit('permille')>, <Unit('count')>, <Unit('percent')>, <Unit('fine_structure_constant')>, <Unit('avogadro_number')>, <Unit('neper')>, <Unit('ppm')>, <Unit('K_alpha_Cu_d_220')>, <Unit('wien_x')>})
pint.Quantity(1, "degree").to("ln10")
# <Quantity(0.00757986863, 'ln10')>
pint.Quantity(1, "degree").to("K_alpha_W_d_220")
# <Quantity(0.160339401, 'K_alpha_W_d_220')>

Not sure these conversions make sense though. 'ln10' isn't what I thought it was.

@aulemahal
Copy link
Contributor

AFAIU, pint is not part of the SPEC0, so we can are not bound to a specific minimum version per this convention.

As for modifying the regex, I see no issue as long as we preserve what was working before. You should know however, that the real convention for units that the CF world has agreed upon is UDUNITS (https://docs.unidata.ucar.edu/udunits/current/, see database at point 6). And cf-xarray does not offer an actual implementation of that, only a crude approximation that works with the most common units. In fact, I see units in the udunits db that include numbers. So those too would fail with the current regex. Your change might help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants