Skip to content

Conversation

@DWesl
Copy link

@DWesl DWesl commented Aug 6, 2025

np.hypot finds the length of the hypotenuse of a right triangle with leg lengths given by the arguments.

The main benefit here would likely be locality (which should boost speed a bit): hypot would perform one pass through each array (two total) instead of the current four. NumPy also tries to avoid over- and under-flow, but that's less of a concern if people pass winds mostly in the 0.1 m/s - 100 m/s range.

There might also be a small speed boost from using sqrt and square functions instead of a general power function, but I suspect this particular function is usually memory-limited, especially given the emphasis on SIMD in recent NumPy development.

np.hypot finds the length of the hypotenuse of a right triangle with other sides given by the arguments.  

The main benefit here would likely be speed: hypot would perform one pass through each array (two total) instead of the current four.  NumPy also tries to avoid over- and under-flow, but that's less of a concern if people pass winds mostly in the 0.1 m/s - 100 m/s range.
@ajdawson
Copy link
Owner

Is numpy/numpy#14761 no longer accurate, or are you just assuming performance benefits from this?

@DWesl
Copy link
Author

DWesl commented Nov 24, 2025

Good catch. It was an assumption, but a quick test suggests that's still accurate for "small" arrays (less than a million elements; around 16 levels on a one-degree grid):

$ python -m timeit -s 'import numpy as np; arr1 = np.arange(1_000_000, dtype="f4"); arr2 = np.ones_like(arr1)' \
>    '(arr1 ** 2 + arr2 ** 2) ** 0.5'
100 loops, best of 5: 3.19 msec per loop
$ python -m timeit -s 'import numpy as np; arr1 = np.arange(1_000_000, dtype="f4"); arr2 = np.ones_like(arr1)' \
>    'np.sqrt(arr1 ** 2 + arr2 ** 2)'
100 loops, best of 5: 3.32 msec per loop
$ python -m timeit -s 'import numpy as np; arr1 = np.arange(1_000_000, dtype="f4"); arr2 = np.ones_like(arr1)' \
>    'np.hypot(arr1, arr2)'
100 loops, best of 5: 3.83 msec per loop
$ python -m timeit -s 'import numpy as np; arr1 = np.arange(1_000_000, dtype="f4"); arr2 = np.ones_like(arr1)' \ 
>    '(arr1 * arr2 + arr2 * arr2) ** 0.5'
100 loops, best of 5: 3.35 msec per loop

but that changes for "large" arrays (over ten million elements; around 10 levels on a quarter-degree grid):

$ python -m timeit -s 'import numpy as np; arr1 = np.arange(10_000_000, dtype="f4"); arr2 = np.ones_like(arr1)' \
>     '(arr1 ** 2 + arr2 ** 2) ** 0.5'
5 loops, best of 5: 70.9 msec per loop
$ python -m timeit -s 'import numpy as np; arr1 = np.arange(10_000_000, dtype="f4"); arr2 = np.ones_like(arr1)' \
>     'np.sqrt(arr1 ** 2 + arr2 ** 2)'
5 loops, best of 5: 68.3 msec per loop
$ python -m timeit -s 'import numpy as np; arr1 = np.arange(10_000_000, dtype="f4"); arr2 = np.ones_like(arr1)' \
>     'np.hypot(arr1, arr2)'
5 loops, best of 5: 52.6 msec per loop
$ python -m timeit -s 'import numpy as np; arr1 = np.arange(10_000_000, dtype="f4"); arr2 = np.ones_like(arr1)' \
>     'np.sqrt(arr1 * arr1 + arr2 * arr2)'
5 loops, best of 5: 68.6 msec per loop

I'd forgotten the extra conditionals in hypot, but I still think locality will win out ... eventually. It looks like that won't happen until there's at least a gigabyte of data, which seems to be rather a lot, and users can always call it themselves if they have enough data that it would help.

@DWesl DWesl closed this Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants