Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration over time #457

Open
jdries opened this issue Aug 23, 2023 · 2 comments
Open

Integration over time #457

jdries opened this issue Aug 23, 2023 · 2 comments

Comments

@jdries
Copy link
Contributor

jdries commented Aug 23, 2023

A use case requires us to sum a band over an irregular time dimension. To do this correctly, the number of days between observations needs to be taken into account.

The question here is if we require a new process, for convenience, or if we can define a process graph that solves this.

This is somewhat similar to: https://docs.xarray.dev/en/stable/generated/xarray.DataArray.integrate.html

I made an attempt to solve this with existing processes, but couldn't verify it yet because our backend doesn't support all the details yet. It also has the downside that it is hard to optimize for the backend: we apply a function over the whole time dimension, while we only need information about the next label:

    from openeo.processes import date_difference, array_labels, array_apply
    def weighted_dmp(labeled_array):
        dates = array_labels(labeled_array)
        def weighting(x,index,label):
            days = date_difference(label,dates[index+1])
            return x*days
        array_apply(labeled_array,weighting)

    weighted_dmp = dmp_cube.apply_dimension(dimension='t',process=weighted_dmp)
    weighted_dmp.reduce_dimension(dimension='t',reducer='sum')
@clausmichele
Copy link
Member

Interesting use case. In a similar scenario, I did retrieve first the temporal labels and then compute the date difference client side, since I didn't know how to do everything using openEO processes.

@m-mohr
Copy link
Member

m-mohr commented Aug 24, 2023

You could compute the differences only once and pass them in via the context into reduce_dimension, right?
(probably not valid Python client code, but should visualize the idea good enough)

def weighting(x, index, label, context):
  return date_difference(x, context[index+1])

dates = dmp_cube.dimension_labels('t')
weights = array_apply(dates, weighting, context = dates)

def reducer(data, context):
  return sum(array_combine(data, context, 'multiply'))

weighted_dmp.reduce_dimension(dimension='t',reducer=reducer, context = weights)

Would this be faster/better?

Additional questions:

  • Should array_apply get a new callback parameter that provides the full array? Then you don't need to pass it through the context.

  • array_combine is a new process which I thought could be useful, but can be emulated via array_apply. It takes two arrays and merges them using a reducer that accepts two values, such as multiple or add.

    array_combine is basically:

    def combine(x, index, label, context):
      return multiply(x, context[index])
    
    combined = array_apply(array1, combine, context = array2)
    

    Could alternatively also accept an array of arrays and then work with array functions, so sum instead of multiply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants