Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static Type Annotations for SymPy #4

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.vscode
136 changes: 136 additions & 0 deletions SymPEP-XXXX.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# SymPEP X — Adopting Static Typing in SymPy

**Author:** Sangyub Lee

**Status:** Draft

**Type:** Standards Track

**Created:** 2023-08-25

**Resolution:** [Link to Discussion]

## Abstract

This SymPEP proposes the adoption of static typing in the SymPy codebase. Static typing can enhance the development experience, improve code quality, and facilitate better collaboration among contributors.

## Motivation and Scope

The current SymPy codebase is predominantly dynamically typed, which can lead to runtime errors and make it harder to understand the codebase. The adoption of static typing will provide better tooling support, catch type-related errors at compile-time, and improve the overall robustness of the library. This SymPEP aims to define the scope and guidelines for introducing static typing to SymPy.

## Usage and Impact

With static typing, users of SymPy will benefit from improved code suggestions, better error messages, and increased confidence in the correctness of their code. For instance, when working with symbolic expressions, the static type system can help catch potential issues in function calls, attribute accesses, and type mismatches. This will lead to a more intuitive and reliable programming experience for SymPy users.

For example, the functions can be static typed as follows:

```python
from typing import List
from sympy import Expr, Symbol

def differentiate(expr: Expr, var: Symbol) -> Expr:
"""
Differentiate a SymPy expression with respect to a variable.
"""
return expr.diff(var)

def simplify_expressions(expressions: List[Expr]) -> List[Expr]:
"""
Simplify a list of SymPy expressions.
"""
return [expr.simplify() for expr in expressions]
```

which is better than the dynamically typed versions:

```python
def differentiate(expr, var):
"""
Differentiate a SymPy expression with respect to a variable.
"""
return expr.diff(var)

def simplify_expressions(expressions):
"""
Simplify a list of SymPy expressions.
"""
return [expr.simplify() for expr in expressions]
```

which is not friendly for users because they can simplify expressions with a list of integers or strings. The static typed version can catch this error at compile-time.


```python
from sympy import Expr, Symbol

def differentiate(expr, var):
"""
Differentiate a SymPy expression with respect to a variable.
"""
if isinstance(expr, Expr) and isinstance(var, Symbol):
return expr.diff(var)
else:
raise TypeError("expr must be a SymPy expression and var must be a SymPy symbol")

def simplify_expressions(expressions):
"""
Simplify a list of SymPy expressions.
"""
if isinstance(expressions, list):
if all(isinstance(expr, Expr) for expr in expressions):
return [expr.simplify() for expr in expressions]
else:
raise TypeError("expressions must be a list of SymPy expressions")
```

which is more verbose and less efficient than the static typed versions, because of added runtime type checks.

## Backwards Compatibility

Introducing static typing to SymPy may break backwards compatibility for code that relies on dynamically typed behavior. Users who heavily depend on runtime type inference might need to update their code to match the new type annotations.

## Detailed Description

This SymPEP proposes the gradual introduction of static type annotations using tools like Python's `typing` module and third-party type checkers such as `mypy` or `pyright`. The process will involve identifying critical modules, functions, and classes to begin the static typing integration. We will define guidelines for annotating function signatures, class attributes, and return types. The goal is to maintain compatibility with existing dynamically typed code while allowing for a smooth transition.

Authors of new classes, functions, or modules in SymPy should be encouraged to write their code with static typing, unless they encounter a situation where achieving this is difficult or not possible. In such cases, they are expected to include a comment explaining the reasons preventing static typing.
sylee957 marked this conversation as resolved.
Show resolved Hide resolved

Introducing new dynamically typed code initially can lead to the accumulation of technical debt. It's recognized that migrating existing dynamically typed code later is significantly more challenging than initially writing code with static typing. Therefore, introducing new dynamically typed code should be approached cautiously.
sylee957 marked this conversation as resolved.
Show resolved Hide resolved

It's important to note that tools like mypy and pyright are capable of inferring types, facilitating the incorporation of static typing without a steep learning curve. By adding a few annotations, code can be made cleaner and the transition to static typing across the entire codebase can be eased.

For functions, classes, and modules internally used by SymPy that currently feature unnecessary dynamic type checks, a shift towards static typing should be promoted. This transition can help eliminate these unnecessary checks and subsequently enhance the overall performance of the SymPy codebase.
Copy link
Member

@moorepants moorepants Aug 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who deems these checks unnecessary? If we want a function/method to be duck-typed, I believe we should be allowed to do so. Duck-typing has been the design paradigm preached by the Python community since I can remember and it is a feature, not a bug.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should not drop dynamic type checks because of adding static typing. When not using an editor supporting this (you have no idea how many students edit their code in whatever the system offers), there should still be sensible error messages, not the code crashing later with a seemingly unrelated error message.

If nothing else, this will probably reduce the number of issues opened that boils down to the user doing something wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checks are necessary in user-facing functions but should not be necessary in much of the internal code if a static checker can verify correctness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also that if the type hints are used consistently on user facing functions then an editor like vscode will already warn the user about passing the wrong type into a function before the code even runs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python allows to inspect type annotations in runtime

def square(x: int):
    return x * x

square.__annotations__
# {'x': int}

such that we can implement something like @validate such that

# Function for users that raises on isinstance(x, int)
@validate
def square(x: int):
    return x * x

# Function for internal use
def square(x: int):
    return x * x

So even if we want to do runtime validation,
It should be better done like above, than repeating isinstance checks manually,
which often makes the code more complicated with validation code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree. Just wanted to point out that not all people use editors with type hinting (and/or do not understand how to benefit from them).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not all people use editors with type hinting

Agreed. I make very heavy use of ipython for example which does not support this.

Note though that the sentence in the PEP that we are all commenting on here explicitly says (emphasis added):

For functions, classes, and modules INTERNALLY used by SymPy that currently feature unnecessary dynamic type checks, a shift towards static typing should be promoted.

The example here is a good one:
https://github.com/sympy/sympy/blob/b0dcb5af49e7289680d9789d292197675e40490d/sympy/polys/polyclasses.py#L167-L171
In gh-25651 I found bugs by enabling that check. The check is too expensive to be used at runtime though so it was commented out before merge. Almost everything that it checks is something that could be verified by a static type checker but doing that check at runtime means recursively calling isinstance down through a potentially large data structure. The class in question is purely internal and not something that any ordinary SymPy user would ever see directly.

Note also that just attempting to verify that the codebase does indeed only use those types in this particular function is something that consumed development time. The whole PR gh-25651 would be completely unnecessary if the code in question just used type hints but as it is if I want to verify what the types are than I have to add runtime checks and ensure that the entire CI test suite completes successfully (and then fix the bugs that show up).


Similarly, functions, classes, and modules within SymPy that intentionally avoided runtime type checks for performance reasons should consider embracing static typing. Static typing undoubtedly mitigates performance overhead while providing better bug detection related to type errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a potentially compelling statement. If there were some real world metrics that show how much a large codebase like SymPy could be sped up by removing all runtime checks and moving to a linting-based type checking procedure, then that would help convince us of any need to do such a radical change.

Copy link
Member

@moorepants moorepants Aug 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand though, the speed bottlenecks in SymPy are not due to large numbers of runtime checks but due to inefficient symbolic algorithms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be some discussions about how checking list[list[int]] in runtime could be avoided in polynomial systems, and can be replaced by static typing.

https://github.com/sympy/sympy/blob/b0dcb5af49e7289680d9789d292197675e40490d/sympy/polys/polyclasses.py#L167-L171

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @sylee957 notes it is not just about speed bottlenecks because in many cases the checking is not done to avoid the slowdowns that it would cause. Then the cost of not checking is bugs, debugging etc rather then any immediate measurable slowness. Static type checking can reduce certain types of bugs without causing runtime slowdown.


However, in the case of core and public functions and classes, such as ``sympify``, ``simplify``, or ``lambdify``, which have historically relied on dynamic typing for an extended period, the introduction of static typing should be approached judiciously. It should be implemented for these core components only if it doesn't disrupt backward compatibility.

It's worth noting that SymPy has devoted considerable effort over time to address type-related issues of Python objects within SymPy. This was especially relevant during periods when the Python type system was less mature or lacked type hinting capabilities.

## Implementation

1. Identify key modules for static typing.
2. Start by adding type annotations to function signatures and class attributes.
3. Use `mypy` or `pyright` to perform static type checking and address any issues.
4. Gradually propagate static typing to dependent modules and functions.
5. Update documentation to reflect the new static typing conventions.
6. Use `mypy` or `pyright` with strict mode.

## Alternatives

An alternative approach would be to maintain the status quo of dynamic typing. However, this could lead to ongoing challenges in maintaining code quality and preventing runtime errors, especially as the SymPy codebase continues to evolve.

## Discussion

- [Look into using type hints](https://github.com/sympy/sympy/issues/17945)

## References

- [Python Typing Documentation](https://docs.python.org/3/library/typing.html)
- [PEP-484](https://peps.python.org/pep-0484/)
- [MyPy Documentation](https://mypy.readthedocs.io/en/stable/)
- [Pyright Documentation](https://microsoft.github.io/pyright/)

## Copyright

This document has been placed in the public domain.