Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade rustpython #3063

Closed
killme2008 opened this issue Jan 2, 2024 · 11 comments
Closed

Upgrade rustpython #3063

killme2008 opened this issue Jan 2, 2024 · 11 comments

Comments

@killme2008
Copy link
Contributor

What type of enhancement is this?

Tech debt reduction

What does the enhancement do?

Our rustpython version looks too old, maybe we need to upgrade it.

Implementation challenges

No response

@tisonkun
Copy link
Collaborator

tisonkun commented Apr 3, 2024

@killme2008 @discord9 I remember that in an offline discussion, it's thrown that we might finally use PyO3 only.

Is there any concrete reason and plan? IIRC we initially add RustPython to decouple the dependency to system python interpreter. But now we think it's a fair requirement?

@killme2008
Copy link
Contributor Author

killme2008 commented Apr 8, 2024

@killme2008 @discord9 I remember that in an offline discussion, it's thrown that we might finally use PyO3 only.

Is there any concrete reason and plan? IIRC we initially add RustPython to decouple the dependency to system python interpreter. But now we think it's a fair requirement?

The main reason is that RustPython is far from production currently, and Python and PyO3 are developing rapidly.
Using PyO3 and native Python, the user can use all the community resources such as open-source projects and documents, etc. and don't have to worry about any compatibility issues.

@tisonkun
Copy link
Collaborator

@killme2008 can we first dropping the code related to RustPython and recommend to use the PyO3 backend version?

For PyO3 backend version, the major challenge is supported range of Python version. @messense do you have some inputs in this topic? We're a binary rather than a lib, so we may not release multiple artifacts to cover the py versions like OpenDAL. Instead, we're trying to use abi3 to expand the supported versions range. But we still lack knowledge in how to move forward in this direction.

@messense
Copy link

Embedding python interpreter is an underexplored area in pyo3, pyo3-ffi does not support dynamic loading yet: PyO3/pyo3#2668.

An alternative could be to build your own .so which implements the Python plugin engine (and uses PyO3's existing link model), and then your binary could dynamically load that .so if a Python plugin is requested.

Will this work for greptimedb?

@tisonkun
Copy link
Collaborator

@messense Thanks for your information. Let me think of it ...

Cross-ref: https://pyo3.rs/v0.21.0/building-and-distribution

@tisonkun
Copy link
Collaborator

@messense we may not use embedded python interpreter, but to ensure that the binary can work with multiple python versions well. We currently configure it as:

pyo3 = { version = "0.19", optional = true, features = ["abi3", "abi3-py37"] }

But yet to test against multiple Python version.

@tisonkun
Copy link
Collaborator

@killme2008 can we first dropping the code related to RustPython and recommend to use the PyO3 backend version?

Found that we currently depend on RustPython's AST parsing function to extra metadata from decorators:

@coprocessor(args=[], returns=["value"], backend="pyo3")
def test_numpy() -> vector[i64]:
    try:
        import numpy as np
        import pyarrow as pa
    except ImportError as e:
        # Python didn't have numpy or pyarrow
        print("Warning: no pyarrow or numpy found in current python", e)
        return vector([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,])
    from greptime import vector
    v = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,])
    v = pa.array(v)
    v = vector.from_pyarrow(v)
    v = vector.from_numpy(v.numpy())
    v = v.to_pyarrow()
    v = v.to_numpy()
    v = vector.from_numpy(v)
    return v

@discord9
Copy link
Contributor

discord9 commented Apr 16, 2024

@killme2008 can we first dropping the code related to RustPython and recommend to use the PyO3 backend version?

Found that we currently depend on RustPython's AST parsing function to extra metadata from decorators:

@coprocessor(args=[], returns=["value"], backend="pyo3")
def test_numpy() -> vector[i64]:
    try:
        import numpy as np
        import pyarrow as pa
    except ImportError as e:
        # Python didn't have numpy or pyarrow
        print("Warning: no pyarrow or numpy found in current python", e)
        return vector([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,])
    from greptime import vector
    v = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,])
    v = pa.array(v)
    v = vector.from_pyarrow(v)
    v = vector.from_numpy(v.numpy())
    v = v.to_pyarrow()
    v = v.to_numpy()
    v = vector.from_numpy(v)
    return v

RustPython's parser is now a standalone crate, which is battle-tested in Real Python Usecase like Python Lint Puff so it's intended to be kept, while the RustPython Runtime can be dropped, might consider first disable RustPython backend in code, and clean up code later then.
P.S. I mean ruff sorry

@tisonkun
Copy link
Collaborator

@discord9

No. ruff seems to write their own crate.

The API of RustPython has changed a lot, ref - RustPython/Parser#119.

I generally doubt if such source edit approach the way to go. Maybe a Python UDF like PL/Python or a Python worker that can connect to the SQL server a good alternative.

@discord9
Copy link
Contributor

@discord9

No. ruff seems to write their own crate.

The API of RustPython has changed a lot, ref - RustPython/Parser#119.

I generally doubt if such source edit approach the way to go. Maybe a Python UDF like PL/Python or a Python worker that can connect to the SQL server a good alternative.

https://github.com/astral-sh/ruff/blob/f779babc5f725ccfe9c4e08b26d1abe54166619d/CONTRIBUTING.md?plain=1#L641
This one though it's also a fork, and you are very likely correct, the only advantages now by having a embedded interpreter is just saving serde and network cost, which I didn't benchmark, but could very well be insignificant since pyhton itself is already slow enough

@tisonkun
Copy link
Collaborator

Supersedes by #3777.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants