-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Gridded data #3481
Comments
assert (260640, 24) == temperature.shape == dewpoint.shape # (T*Y*X, Z)
assert (24,) == pressure.shape
%timeit thermo.downdraft_cape(pressure, temperature, dewpoint)
# 218 ms ± 11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
First of all, great work! Indeed, there has been considerable debate and effort on making MetPy faster through (a) vectorization, e.g., #1968; and (b) using faster variants of NumPy, e.g., #3432. I think this increasing need echoes the demand of calculating convective metrics (CAPE, CIN) with model data, as understanding the changing nature of thunderstorms in a warming world has become a very important research topic. The original development of many MetPy thermodynamic functions, however, likely has a meteorological audience in mind (who might only need to calculate CAPE/CIN at a few points). And yet, the scientific community currently lacks such a tool to calculate these quantities en masse. |
@dopplershift what type of precision are you looking for gridded solutions? I've put together a couple of benchmarks below. Cython functionsmoist_lapseimport numpy as np
import metpy.calc as mpcalc
from metpy.units import units
import nzthermo as nzt
N = 1000
Z = 20
P = np.linspace(101325, 10000, Z)[np.newaxis, :] # (1, Z)
T = np.random.uniform(300, 200, N) # (N,)
ml = nzt.moist_lapse(P, T)
%timeit nzt.moist_lapse(P, T)
# 1.22 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
P = P[0] * units.Pa
T = T * units.kelvin
ml_ = [mpcalc.moist_lapse(P, T[i]).m for i in range(N)] # type: ignore
%timeit [mpcalc.moist_lapse(P, T[i]).m for i in range(N)]
# 1.65 s ± 29.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.testing.assert_allclose(ml, ml_, rtol=1e-3) lclP = np.random.uniform(101325, 10000, N) # (N,)
T = np.random.uniform(300, 200, N) # (N,)
Td = T - np.random.uniform(0, 10, N) # (N,)
lcl_p, lcl_t = nzt.lcl(P, T, Td) # ((N,), (N,))
%timeit nzt.lcl(P, T, Td)
# 1.4 ms ± 373 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
P *= units.Pa
T *= units.kelvin
Td *= units.kelvin
lcl_p_, lcl_t_ = (x.m for x in mpcalc.lcl(P, T, Td)) # type: ignore
%timeit mpcalc.lcl(P, T, Td)
# 1.57 s ± 7.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.testing.assert_allclose(lcl_p, lcl_p_, rtol=1e-3)
np.testing.assert_allclose(lcl_t, lcl_t_, rtol=1e-3) Implementationswet_bulb_temperatureP = np.random.uniform(101325, 10000, 1000).astype(np.float32)
T = np.random.uniform(300, 200, 1000).astype(np.float32)
Td = T - np.random.uniform(0, 10, 1000).astype(np.float32)
wb = nzt.wet_bulb_temperature(P, T, Td)
%timeit nzt.wet_bulb_temperature(P, T, Td)
# 1.17 ms ± 124 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
P *= units.Pa
T *= units.kelvin
Td *= units.kelvin
wb_ = mpcalc.wet_bulb_temperature(P, T, Td).m
%timeit mpcalc.wet_bulb_temperature(P, T, Td)
# 390 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
np.testing.assert_allclose(wb, wb_, rtol=1e-3) downdraft_capeNOTE: random values caused metpy's dcape function to throw interpolation warnings and return nan in many most cases. pressure = np.array(
[1013, 1000, 975, 950, 925, 900, 875, 850, 825, 800, 775, 750, 725, 700, 650, 600, 550, 500, 450, 400, 350, 300],
).astype(np.float32)
pressure *= 100.0
temperature = np.array(
[
[243, 242, 241, 240, 239, 237, 236, 235, 233, 232, 231, 229, 228, 226, 235, 236, 234, 231, 226, 221, 217, 211],
[250, 249, 248, 247, 246, 244, 243, 242, 240, 239, 238, 236, 235, 233, 240, 239, 236, 232, 227, 223, 217, 211],
[293, 292, 290, 288, 287, 285, 284, 282, 281, 279, 279, 280, 279, 278, 275, 270, 268, 264, 260, 254, 246, 237],
[300, 299, 297, 295, 293, 291, 292, 291, 291, 289, 288, 286, 285, 285, 281, 278, 273, 268, 264, 258, 251, 242],
],
dtype=np.float32,
)
dewpoint = np.array(
[
[224, 224, 224, 224, 224, 223, 223, 223, 223, 222, 222, 222, 221, 221, 233, 233, 231, 228, 223, 218, 213, 207],
[233, 233, 232, 232, 232, 232, 231, 231, 231, 231, 230, 230, 230, 229, 237, 236, 233, 229, 223, 219, 213, 207],
[288, 288, 287, 286, 281, 280, 279, 277, 276, 275, 270, 258, 244, 247, 243, 254, 262, 248, 229, 232, 229, 224],
[294, 294, 293, 292, 291, 289, 285, 282, 280, 280, 281, 281, 278, 274, 273, 269, 259, 246, 240, 241, 226, 219],
],
dtype=np.float32,
)
dcape = nzt.downdraft_cape(pressure, temperature, dewpoint)
%timeit nzt.downdraft_cape(pressure, temperature, dewpoint)
# 2.41 ms ± 877 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
P = pressure * units.Pa
T = temperature * units.kelvin
Td = dewpoint * units.kelvin
dcape_ = [mpcalc.downdraft_cape(P, T[i], Td[i])[0].m for i in range(temperature.shape[0])] # type: ignore
%timeit [mpcalc.downdraft_cape(P, T[i], Td[i]) for i in range(temperature.shape[0])]
# 16.5 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
np.testing.assert_allclose(dcape, dcape_, rtol=1e-2) I am currently looking into the |
hi, @leaver2000 , thanks a lot for your great work. From above examples, it seems that your implementations are much faster than metpy which takes a long long time to calculate wet bulb temperature. From the codes and nzthermo website, it seems Last but not least, it would be very helpful if you can provide a way to install nzthermo package with |
For wetbulb temperature that can be accomplished as shown below. TIME = 144
LAT = 721
LON = 1440
N = TIME * LAT * LON
P = np.random.uniform(101325, 10000, N).astype(np.float32).reshape(TIME, LAT, LON)
T = np.random.uniform(300, 200, N).astype(np.float32).reshape(TIME, LAT, LON)
Td = T - np.random.uniform(0, 10, N).astype(np.float32).reshape(TIME, LAT, LON)
wb = nzt.wet_bulb_temperature(P.ravel(), T.ravel(), Td.ravel()).reshape(TIME, LAT, LON)
I can look into it publishing a release, the challenge is the only supported/tested OS is linux. |
Thanks a lot. Hope we can see the released version soon. How to install nzthermo in linux? I'm curious about the code |
@zhangslTHU You can use pip to and git to build from source, like... ➜ ~ mkdir myproject
➜ ~ cd myproject
➜ myproject python3.11 -m venv .venv && source .venv/bin/activate
(.venv) ➜ myproject pip install git+https://github.com/leaver2000/nzthermo@master
(.venv) ➜ myproject python -c 'import numpy as np;import nzthermo as nzt;print(nzt.wet_bulb_temperature(np.array([101325.]),np.array([300.0]),np.array([292.0])))'
[294.36783585]
|
@leaver2000 thanks a lot. it works now! It is much faster than Metpy according to test results. It took about 21 mins by Metpy while 0.5s by nzthermo for a array with shape (time=24, lat=181, lon=360),. Here shows the actual time i spent to calculate wet bulb temperature for ERA5 hourly data in Jan 2022 (31 days) with Metpy, and same data with nzthermo but for 1 year (12 months for 2022): running time for metpy
running time for nzthermo
thanks again for the great work and the package your developed, it is very helpful and save much time! |
For the functions that require a profile, the signature you chose suggests NumPy gufuncs, have you looked into those? (They are possible in Cython, but dealing with the possibility of someone having a record array of 32-bit floats and 8-bit integers and passing just the floats to your function is annoying). I'm pretty sure wet-bulb temperature is a point-by-point function (even if you use a definition with parcels rising and falling), so the |
I've have tested out the The wetbulb temperature is element-wise ie: (N,)(N,)(N,) but still calls into the moist_lapse ODE which even for a single level is still solved via recursive iteration. Many other use cases of moist_lapse require (N,Z),(N,),(N,) and it gets a bit tricky when you need each parcel to start and stop at a specific level along Z |
@leaver2000 Apologies for not responding sooner. I want to give a more thoughtful response, but just wanted to say that I am very interested in this work and it aligns with what is our highest development priority, getting things fast enough to have thermo (esp. CAPE) work on input grids. Regarding precision, I just want to feel confident that what's coming out is correct. |
This comment was marked as outdated.
This comment was marked as outdated.
edit: I must have had radar on the brain, any reference to MRMS (multi radar multi sensor) should be HRRR (high resolution rapid refresh) @dopplershift I just merged my development branch into master which provides support for CAPE and CIN calculations, here is a isobaric = xr.open_dataset(
"hrrr.t00z.wrfprsf00.grib2",
engine="cfgrib",
backend_kwargs={"filter_by_keys": {"typeOfLevel": "isobaricInhPa"}},
)
surface = xr.open_dataset(
"hrrr.t00z.wrfsfcf00.grib2",
engine="cfgrib",
backend_kwargs={"filter_by_keys": {"typeOfLevel": "surface", "stepType": "instant"}},
)
T = isobaric["t"].to_numpy() # (K) (Z, Y, X)
Z, Y, X = T.shape
N = Y * X
T = T.reshape(Z, N).transpose() # (N, Z)
P = isobaric["isobaricInhPa"].to_numpy().astype(np.float32) * 100.0 # (Pa)
Q = isobaric["q"].to_numpy() # (kg/kg) (Z, Y, X)
Q = Q.reshape(Z, N).transpose() # (N, Z)
Td = nzt.dewpoint_from_specific_humidity(P[np.newaxis], Q)
prof = nzt.parcel_profile(P, T[:, 0], Td[:, 0])
CAPE, CIN = nzt.cape_cin(P, T, Td, prof)
CAPE = CAPE.reshape(Y, X)
CIN = CIN.reshape(Y, X)
lat = isobaric["latitude"].to_numpy()
lon = isobaric["longitude"].to_numpy()
lon = (lon + 180) % 360 - 180
timestamp = datetime.datetime.fromisoformat(isobaric["time"].to_numpy().astype(str).item())
fig, axes = plt.subplots(2, 2, figsize=(24, 12), subplot_kw={"projection": ccrs.PlateCarree()})
fig.suptitle(f"{timestamp:%Y-%m-%dT%H:%M:%SZ} | shape {Z, Y, X} | size {Z*Y*X:,}", fontsize=16, y=0.9)
# I suspect that the difference between our cape calculations and the MRMS cape calculations is due
# to the fact we are not actually starting at the surface or accounting for surface elevation.
# leading to inflated cape values in areas of higher elevation.
cape = np.where(CAPE < 8000, CAPE, 8000)
cin = np.where(CIN > -1400, CIN, -1400)
for ax, data, title, cmap in zip(
axes[0], [cape, cin], ["NZTHERMO CAPE", "NZTHERMO CIN"], ["inferno", "inferno_r"]
):
ax.coastlines(color="white", linewidth=0.25)
ax.set_title(title, fontsize=16)
ax.set_global()
ax.set_extent([lon.min(), lon.max(), lat.min(), lat.max()])
cf = ax.contourf(lon, lat, data, transform=ccrs.PlateCarree(), cmap=cmap)
plt.colorbar(cf, ax=ax, orientation="vertical", pad=0.05, label="J/kg", shrink=0.75)
MRMS_CAPE = surface["cape"].to_numpy()
MRMS_CIN = surface["cin"].to_numpy()
for ax, data, title, cmap in zip(
axes[1], [MRMS_CAPE, MRMS_CIN], ["MRMS CAPE", "MRMS CIN"], ["inferno", "inferno_r"]
):
ax.coastlines(color="white", linewidth=0.25)
ax.set_title(title, fontsize=16)
ax.set_global()
ax.set_extent([lon.min(), lon.max(), lat.min(), lat.max()])
cf = ax.contourf(lon, lat, data, transform=ccrs.PlateCarree(), cmap=cmap)
plt.colorbar(cf, ax=ax, orientation="vertical", pad=0.05, label="J/kg", shrink=0.75) I've also migrated several of the |
What should we add?
I've been developing an application that works with gridded data. The
metpy.calc.thermo
module has been a great guide for my work thus far.My
moist_lapse
function differs frommetpy.calc.thermo.moist_lapse
and is written inCython
, that seemed to be one of the major bottlenecks the vectorizing things for gridded data support.I currently have implementations of the
moist_lapse
,wet_bulb_temperature
,parcel_profile
,ccl
, anddowndraft_cape
that support 2d:(N, Z)
array structure.My implementations don't use pint, everything is assumed
si
units. Let me know if there is any interest I'm happy to share.delta.max() = 0.6819282354549614 delta.mean() = 0.1615940182263646 delta.std() = 0.13604876693905152
Reference
No response
The text was updated successfully, but these errors were encountered: