Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement table valued functions #4582

Closed
kesmit13 opened this issue Sep 28, 2022 · 7 comments
Closed

feat: implement table valued functions #4582

kesmit13 opened this issue Sep 28, 2022 · 7 comments
Labels
feature Features or general enhancements

Comments

@kesmit13
Copy link
Contributor

I've been looking in the documentation and source for a way to do user-defined table valued functions, but I haven't seen any evidence that they are possible. I see in ibis/expr/rules.py that there is a TABULAR enum element defined and commented out. I was trying to come up with a work-around by injecting various things into the output_shape, output_type, and output_dtype, but I haven't come across any way to cheat one in. I was just wondering if this was something on the roadmap?

@cpcloud
Copy link
Member

cpcloud commented Sep 28, 2022

Thanks for opening this!

This is something that other folks have expressed interest in but we haven't put any concerted effort into making it happen.

Can you give a few examples of functions (other than UNNEST or sampling!) that you're interested in expressing? It sounds like you also want a way to bake in an existing one, is that right? Are you also interested in defining the data in the table in the UDF as well, e.g., generating a random set of rows (with a fixed schema) using a generator, or just mapping in already-defined UDTFs into ibis?

My first thought for the case where the function already exists in the database is that once we refactor the SQL compilers to use dispatching, where all operations--not just Value subclasses--are implemented using dispatching (Table operations are kind of hacked in right now and compiled differently than others) then this will be supported out of the box and be done like this:

  1. define a new subclass of TableNode (e.g., MyTableFunction)

  2. define an API to construct the node and expr,
    e.g., def my_table_function(...) -> TableExpr

  3. define a compilation rule for a specific backend
    e.g.,

    @ibis.bigquery.add_operation(MyTableFunction)
    def compile_my_table_function(t, op):
        return f"MY_TABLE_FUNCTION(...)"  # sql specific to the backend and function

@cpcloud cpcloud added the feature Features or general enhancements label Sep 28, 2022
@kesmit13
Copy link
Contributor Author

The major use-case we have is supporting customers UDTFs. At the moment, we are trying attempting to implement correlation matrices for heat maps and descriptive statistics (like pandas.describe). Generating data would also be very useful, but we would probably just implement the data generation in a UDTF and call it.

@cpcloud
Copy link
Member

cpcloud commented Sep 30, 2022

Ok, awesome. So it sounds like then the ability to compile an existing UDTF would get you what you want.

I think we're in a place where we can start the refactor for the SQL backends, but it'll be a bit before we can land that. I can't promise anything at the moment, but I've been playing with using sqlglot to rewrite the ClickHouse backend here (it's a bit out of date, I haven't had a chance to rebase in a while). For your UDTFs, we'd map Backend.add_operation translate_rel for the compilation.

@cpcloud cpcloud changed the title Implement table valued functions feat: implement table valued functions Oct 8, 2022
@NickCrews
Copy link
Contributor

Echoing that I just ran into a need to do this. The use case is similar to a coordinate reprojection, eg (x, y) -> (x', y') (except of course I can't figure out how to do it in ibis, it only seems feasible to do numerically with scipy)

@cpcloud
Copy link
Member

cpcloud commented Sep 22, 2023

@NickCrews Can you give a bit more detail about what you're doing and if you had to express it Python or SQL how you'd write (or ideally write it?)

@NickCrews
Copy link
Contributor

Actually, a UDF didn't quite make sense because although I was operating a large number of values, the number of unique values was actually small. So I just pre-computed that mapping and then could just .substitute(). So my use case isn't relevant, please ignore.

@cpcloud
Copy link
Member

cpcloud commented Jul 31, 2024

This can be done currently with a UDF that returns a struct type, and then unpack/unnesting it. Closing as not planned for now.

@cpcloud cpcloud closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2024
@github-project-automation github-project-automation bot moved this from backlog to done in Ibis planning and roadmap Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Archived in project
Development

No branches or pull requests

3 participants