Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_udf: Unclear why two data types are defined #515

Open
m-mohr opened this issue Jul 8, 2024 · 3 comments · May be fixed by #519
Open

run_udf: Unclear why two data types are defined #515

m-mohr opened this issue Jul 8, 2024 · 3 comments · May be fixed by #519
Milestone

Comments

@m-mohr
Copy link
Member

m-mohr commented Jul 8, 2024

Process ID: run_udf

Describe the issue:
Run UDF has the following definition for data:

        {
            "name": "data",
            "description": "The data to be passed to the UDF.",
            "schema": [
                {
                    "title": "Array",
                    "type": "array",
                    "minItems": 1,
                    "items": {
                        "description": "Any data type."
                    }
                },
                {
                    "title": "Single Value",
                    "description": "A single value of any data type."
                }
            ]
        }

Why does it explicitly define an array type? The way the second schema is defined, both apply for an array. So it's equivalent to:

        {
            "name": "data",
            "description": "The data to be passed to the UDF.",
            "schema": {
                "title": "Single Value",
                "description": "A single value of any data type."
            }
        }

Can someone remember why this is the way it is? @soxofaan maybe?

Proposed solution:
Simplify the schema?

@m-mohr
Copy link
Member Author

m-mohr commented Jul 8, 2024

Git blame shows that this was changed 2021 (6ec1848) from two schemas (raster-cube and array) to two schemas (array and any). I believe this was done to encourage usage in datacube operations such as reduce_dimension and apply_dimension. It's not made overly clear in the description of the parameter. Either we need to describe the two schemas better or we should remove it and explain better what users are expected to do.

@soxofaan
Copy link
Member

soxofaan commented Jul 8, 2024

changed 2021 (6ec1848) from two schemas (raster-cube and array) to two schemas (array and any).

you mean from three schemas (raster-cube, array and any) to two schemas (array and any)

regardless, I have no idea why it's like this or if there are important reasons not to simplify as proposed.
The only reason I can think of is to make things a bit more self-documenting, but I guess it's then just better to improve descriptions/dcos than keeping confusing schema constructs

@m-mohr
Copy link
Member Author

m-mohr commented Jul 26, 2024

Yes, I agree... I guess it was meant as a way to somehow recommend that UDFs are used in callbacks.
PR: #519

@m-mohr m-mohr linked a pull request Jul 26, 2024 that will close this issue
@m-mohr m-mohr added this to the 2.0.0 milestone Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants