[Question] Efficiently selecting nearest time data per group in Xarray #10233
-
|
Subject: Efficiently selecting nearest time data per group (sv) in Xarray Hi Xarray community, I'm working with GNSS data where I need to calculate satellite positions based on ephemeris data. I have two main Xarray Datasets:
Goal: For each
Current (Slow) Approach: I'm currently using nested loops, which is inefficient for my dataset size (potentially thousands of time steps and multiple satellites): # ranges has coordinates sv, time
# nav_data has coordinates sv, time
# result is pre-allocated with coordinates matching ranges
for satellite in ranges.sv.values:
# Pre-filter nav_data for the current satellite
nav_data_sat = nav_data.sel(sv=satellite).dropna(dim='time', how='all')
# Iterate through the time coordinates relevant for the calculation
for dt in ranges.time.values:
# Find the single ephemeris entry for 'satellite' closest in time to 'dt'
# This assumes we need a result for every combination, adjust if ranges is sparse
ephemeris = nav_data_sat.sel(time=dt, method='nearest')
# Perform calculation using the selected ephemeris and dt
# x, y, z, ... = satellite_position_velocity_clock_correction(ephemeris, dt)
# Store results for this specific (dt, satellite) pair
# result['x'].loc[dt, satellite] = x
# ... etc ...entire module here Challenge & Attempts: I need a vectorized Xarray solution to replace these loops. I've tried:
Question: What is the idiomatic Xarray way to efficiently perform this grouped nearest-neighbor lookup? Specifically, how can I select data from Thanks for any guidance or suggestions! |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
|
Interesting problem! Can you write a minimal example with synthetic data that we could test out please? |
Beta Was this translation helpful? Give feedback.
-
|
Here you go. import numpy as np
import xarray as xr
def f(ephemeris, x):
a = ephemeris.a.item()
b = ephemeris.b.item()
return a * x + b
# Function to compute values based on nearest time
def compute_nearest_time_values(ephemeris, observations, x):
"""
Computes f(x) = a * x + b for each (id, time) pair in observations,
using the nearest (id, time) pair from data for coefficients a and b.
"""
result = xr.Dataset(
{
"value": (("id", "time"), np.empty((len(observations.id), len(observations.time))))
},
coords={"id": observations.id, "time": observations.time},
)
for identifier in observations.id:
observations_id = ephemeris.sel(id=identifier).dropna(dim="time", how="all")
for time in observations.time:
# Find nearest entry in data
nearest = observations_id.sel(time=time, method="nearest")
# Compute the value
value = f(nearest, x)
# Store the result
result["value"].loc[identifier, time] = value
return result
# Dummy ids
i0, i1 = "id0", "id1"
# Dummy ephemeris
e0, e1, e2, e3 = "2025-01-01T00:00", "2025-01-01T06:00", "2025-01-01T12:00", "2025-01-01T18:00"
ephemeris = xr.Dataset(
{
"a": (("id", "time"), [[1, np.nan, 3, 4], [10, 20, np.nan, 40]]),
"b": (("id", "time"), [[5, np.nan, 3, 2], [50, 40, np.nan, 20]]),
},
coords={"id": [i0, i1], "time": np.array([e0, e1, e2, e3], dtype="datetime64")},
)
# Dummy observation
o0, o1, o2 = "2025-01-01T02:30", "2025-01-01T06:15", "2025-01-01T12:15"
observations = xr.Dataset(
coords={"id": [i0, i1], "time": np.array([o0, o1, o2], dtype="datetime64")},
)
x = 10
result = compute_nearest_time_values(ephemeris, observations, x)
assert float(result.sel(id=i0, time=o0).value) == 1 * x + 5
assert float(result.sel(id=i0, time=o1).value) == 3 * x + 3
assert float(result.sel(id=i0, time=o2).value) == 3 * x + 3 # 06:00 does not exist, 12:00 is nearest
assert float(result.sel(id=i1, time=o0).value) == 10 * x + 50
assert float(result.sel(id=i1, time=o1).value) == 20 * x + 40
assert float(result.sel(id=i1, time=o2).value) == 40 * x + 20 # 12:00 does not exist, 18:00 is nearest |
Beta Was this translation helpful? Give feedback.
-
|
One has to do the following it is reasonably faster but not perfect.
|
Beta Was this translation helpful? Give feedback.
-
|
I think the main issue is xarray does not supports multi select with different methods. One would need |
Beta Was this translation helpful? Give feedback.
One has to do the following it is reasonably faster but not perfect.
UniqueGrouperandBinGroupersqueezethe id coordinateselect using pad or nearestmaxorminto get a dense matrix.