Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding Python annotation support #352

Closed
IvanIsCoding opened this issue Jun 6, 2021 · 10 comments
Closed

Consider adding Python annotation support #352

IvanIsCoding opened this issue Jun 6, 2021 · 10 comments
Labels
enhancement New feature or request

Comments

@IvanIsCoding
Copy link
Collaborator

IvanIsCoding commented Jun 6, 2021

What is the expected enhancement?

Users are able to execute mypy to run a static type check in code that uses retworkx.

Python annotations are becoming more popular, and we should consider supporting them. Supporting annotations can help make retworkx more popular in codebases that have adopted mypy. Also, many popular libraries that do not support annotations yet have tickets claiming for them.

Suggested Implementation

Create .pyi stubs containing the type annotations for the existing code. See typeshed for some inspiration on how this is done. Another useful inspiration is orjson, they are another Python library that uses Rust/PyO3 and they also use the .pyi stub approach.

Challenges

  • Duplication: if we use .pyi stubs, we will have to duplicate all the functions signatures in the stubs files; ideally PyO3 would generate all the stubs but Feature request: Generate stub files along during compile PyO3/pyo3#510 is far from being closed
  • Packaging: making sure the stubs are installed and third-party code recognizes it when running mypy
  • Testing: how do we test our stubs are correct and up to date?

Non-Goals

Use mypy internally within retworkx. The emphasis is: we want users to be able to run mypy when checking code that uses retworkx.

@mtreinish mtreinish added enhancement New feature or request good first issue Good for newcomers labels Jun 7, 2021
@IvanIsCoding
Copy link
Collaborator Author

IvanIsCoding commented Aug 1, 2021

I had an idea that can help us avoid duplication and make type annotations manageable. Here is what I thought:

Idea

The idea is to make .pyi stub files the source of text_signature in our Rust . The stub files look like:

class PyGraph:
    
    def has_edge(self, node_a: int, node_b: int, /) -> bool: ...

In Rust, we would load the text signature using a constant to reuse the definition originally in .pyi:

#[pyo3(text_signature = annotation::PYGRAPH_HAS_EDGE)]
pub fn has_edge(&self, node_a: usize, node_b: usize) -> bool {

Possibly we could also use an annotation! procedural macro to make it slightly better:

#[pyo3(text_signature = annotation!("PyGraph.has_edge"))]
pub fn has_edge(&self, node_a: usize, node_b: usize) -> bool {

That way, we do not need two worry about conflicting text signatures and annotations. All annotations would "magically" come from .pyi files, once we update the stub file they propagate to the Rust code.

Implementation

A Possible implementation for the idea is:

1. Create a Python script that parses the .pyi stub and generates text signatures

Leverage Python's ast module to parse .pyi files and output what PyO3 expects.

The text signatures that PyO3 expects are slightly different than what is on the .pyi file. PyO3 expects text_signature = "(self, node_a, node_b, /)" which is stripped out of the argument types and return types.

The ast module has the tools to do that. I think we can parse def has_edge(self, node_a: int, node_b: int, /) -> bool:, remove the items we do not want and then unparse back to a string we can consume.

2. Put the generated signatures as constants in a .rs

Next, the Python script needs to dump the signatures into a file Rust can consume. That would be an autogenerated .rs with all the signatures as constants. We'd export the constants under crate::annotation and reuse them in other parts of the code.

A detail for this step is that we'd need to add a CI check to verify that the autogenerated file is up to date. I think it is better to have the autogenerated file on Git because that way we do not need to add the Python script to the build process.

3. Create the macro annotation! macro to load the signatures

This is where some additional magic could happen. It could parse the tokens and convert things like PyGraph.has_edge into crate::annotation::PYGRAPH_HAS_EDGE. The former is closer to the Python definition, so it could bring some value.

@georgios-ts
Copy link
Collaborator

Sorry for coming late into this, especially since you have done a lot of work in #401.

We can leverage the procedural attribute macros of Rust [1] together with the parsing capabilities of the syn crate to automatically generate the .pyi stubs. This means that during compilation we will generate the stubs and we can do it conditionally by utilizing cfg_attr [2].
The idea is as follows:

  1. Define our own procedural macro where we parse the input as an ItemFn [3] and retrieve the name of the function, its arguments (name + type) and output type.
     extern crate proc_macro;
     use proc_macro::TokenStream;
     use syn;
    
     #[proc_macro_attribute]
     pub fn pyhints(attr: TokenStream, item: TokenStream) -> TokenStream {
         let input: syn::ItemFn = syn::parse(item.clone()).unwrap();
         let sig = input.sig;
         ....
    
         item
     }
  2. Define the mapping from Rust types to Python types.
  3. Some functions take in a PyObject but they really require a Callable. This information can be passed in the macro as an attribute.
  4. Dump all these in .pyi files with the correct syntax.

In our rust code we will write something like:

#[pyfunction]
#[cfg_attr(feature = "mypy", pyhints)]
pub fn algo(graph: PyGraph, node: usize) -> usize { ... }

The downside is that this approach does not work for the functions defined in __init__.py and we will have to do them manually. It might also be hard to do this for our custom iterators too.

@georgios-ts
Copy link
Collaborator

Coming back with a more detailed approach (works for functions but not for classes) that is slightly different from what I had in mind in the previous comment:

  1. Define a custom trait like

     trait ToPyHint {
          fn convert() -> String;
     }

    that gives the description (just a string) of the equivalent python type for rust types that we use in retworkx as arguments in pyfunctions.

  2. Implement a proc macro that receives a function definition:

    • retrieves the name of the function, its arguments (name + type) and output type.
    • emits another function marked as a #[test]. Inside the body of this new function, we'll call ToPyHint::convert() for the types that appear in the original func signature, format everything according to the stub files syntax and print in in stdout.
  3. Run the tests with cargo, collect the output and dump the lines of interest in a new .pyi file.

I have an initial working implementation https://github.com/georgios-ts/stubgen (it's not fully ready to use it in retworkx).

@IvanIsCoding
Copy link
Collaborator Author

Coming back with a more detailed approach (works for functions but not for classes) that is slightly different from what I had in mind in the previous comment:

  1. Define a custom trait like

     trait ToPyHint {
          fn convert() -> String;
     }

    that gives the description (just a string) of the equivalent python type for rust types that we use in retworkx as arguments in pyfunctions.

  2. Implement a proc macro that receives a function definition:

    • retrieves the name of the function, its arguments (name + type) and output type.
    • emits another function marked as a #[test]. Inside the body of this new function, we'll call ToPyHint::convert() for the types that appear in the original func signature, format everything according to the stub files syntax and print in in stdout.
  3. Run the tests with cargo, collect the output and dump the lines of interest in a new .pyi file.

I have an initial working implementation https://github.com/georgios-ts/stubgen (it's not fully ready to use it in retworkx).

This is a remarkable progress! I think if you post it on the PyO3 thread they would be very interested. You have the best idea/proof-of-concept code since the debate started.

I did not have time to post earlier, but the main challenge is with PyAny. PyO3 erases information when we use PyAny in Rust, and recovering it is not easy. I will take some examples from the PR, I have not figured out yet how to generate this from the Rust code:

class PyDiGraph(Generic[S, T]): 
    def compose(node_map_func: Optional[Callable[[S], int]]): ...

We can try replacing it with Any, but doing type checking wouldn't be as useful. I wonder if there is a meet-in-the-middle solution between parsing the Rust code and checking it against a reference .pyi file.

@georgios-ts
Copy link
Collaborator

Yeah, that's true, using PyAny / PyObject is convenient but we lose type information. Apart from manually editing .pyi files, one solution might be to define thin wrappers around a PyObject that correspond to a known type to us that we can't really tell to pyo3 and use these in our pyfunctions.

For example, as a weight function in the shortest path algorithms, we could define

struct EdgeWeightFunc(PyObject);

impl ToPyHint for EdgeWeightFunc {
      fn convert() -> String {
        String::from("Callable[[EWeight], float]")
      }
}

with the convention that:

impl ToPyHint for PyDiGraph {
      fn convert() -> String {
        String::from("PyDiGraph[NWeight, EWeight]")
      }
}

and in .pyi file we define:

NWeight = TypeVar('NWeight')
EWeight = TypeVar('EWeight')

Admittedly, it's getting a bit ugly.

@adriangb
Copy link

Just to play devil's advocate: would it be that bad to manually maintain .pyi files? Writing them the first time can be painful, but it can be gradual. And if you did run mypy on your codebase, you can test that the annotations are correct, so they aren't manual in the sense that you have to make sure the codebase doesn't drift away from them, even if you do still have to fix them by hand.

@IvanIsCoding
Copy link
Collaborator Author

Just to play devil's advocate: would it be that bad to manually maintain .pyi files? Writing them the first time can be painful, but it can be gradual. And if you did run mypy on your codebase, you can test that the annotations are correct, so they aren't manual in the sense that you have to make sure the codebase doesn't drift away from them, even if you do still have to fix them by hand.

It is not bad to maintain .pyi files by hand, at least for the frequent contributors and end-users. #401 even has a file that indicates the annotations are partial, so if someone finds a mismatch or a missing annotation they can just write a local file with the correct annotation and mypy will use that instead.

Maintaining .pyi files by hand is particularly bad for first-time contributors though. They already need to write code and create test cases, which is a lot. If we ask them to modify .pyi files too, that would be an extra task they have to do. Also, they are the most likely to forget to update those.

So if we had autogenerated annotations, it would make it easier to contribute & also keep the .pyis up to date

@adriangb
Copy link

adriangb commented Dec 2, 2021

they are the most likely to forget to update those

Would running MyPy on your tests help? That way if they forget they get an explicit error saying type annotations are missing for xyz method.

@IvanIsCoding
Copy link
Collaborator Author

they are the most likely to forget to update those

Would running MyPy on your tests help? That way if they forget they get an explicit error saying type annotations are missing for xyz method.

Running mypy on tests is not as helpful as it seems, it generally doesn't cover all cases. We're planning on going with specific tests for annotations see https://github.com/Qiskit/retworkx/blob/546f6ee9e8d1faaec7879829f8de0f85ca2b5b68/stubs-tests/pygraph_test.py for an example

@IvanIsCoding
Copy link
Collaborator Author

This effort was concluded by #1061, now that we are enforcing that new changes will need to add type annotations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants