-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python UDF Support #3777
Comments
@discord9 Now I'd prefer to keep the We may still need a design for the whole Python UDF with this distribution decision. And the current script table solution can suck from #2510. |
We have also encountered this issue in databendlabs/databend#15494 via pyo3. Dynamic lib link is unacceptable in distribution release. Maybe we can build the python codes into wasm? https://wasmer.io/posts/py2wasm-a-python-to-wasm-compiler |
@sundy-li Thanks for participating in this thread. I'm afraid that employing the WASM solution would be nothing better than the RustPython solution, both of which can ship the basic Python support without linkage issues. However, the major use cases of Python UDF are to integrate with the board scientific computing (scipy), data analyzing (numpy, pandas), and ML/AI ecosystem. All of them require a full CPython environment as well as its (C) extension support. In the last weekend, I made a draft proposal that, at least in GreptimeDB, we can implement Python UDFs with:
CREATE FUNCTION udf_name(arg0 [opt_ty0], ...)
RETURNS (ret0 [opt_ty0], ...) AS
$$
...
$$
LANGUAGE python3;
We will still have a feature Upon failures, a new server will load the In this way, we don't need the "script engine" and the whole HTTP endpoints at all and fully employ the SQL standards. Thus, we can avoid a lot of confusions and unalignments we found previously (#2434 #2532). Open questionsFollowing PG's CREATE FUNCTION docs, functions are registered per schema scoped and can be restricted with the permission model. But in our first version, we can use a globally shared scripts table, and later break it down per schemas (or add a schema column to describe its owner/scope). |
WASM is a very bad idea. I have tried something like this before in similar circumstance(Gateway UDF/Custom Plugin) Python has official WASM support but is still in the experimental phase. By the way, if you choose WASM, you will drop most of the C extension support defaults. The challenges of the Python UDF in my mind are following below:
In my old experience, Many Gateway developers and me choose to use the RPC as the solution
|
Yes, we have external function to work this way. It works, but it's not efficient because we need to pass the argument column through the rpc network.
seems snowpark is a sidecar container. |
What problem does the new feature solve?
This supersedes:
We'd revisit the support for Python UDF. Currently, it suffers with the following challenges:
sql="..."
in decorator args for inputs. Ideally, we should build a solution like PL/Python in Postgres to describe the args and return types, as well as embedded scripts, instead of depending on a series of random conventions.What does the feature do?
There are several tasks we can do to improve the situation:
For supporting multiple Python versions with the PyO3 backend, here are several related threads:
The text was updated successfully, but these errors were encountered: