Skip to content

Commit

Permalink
cleaned and final update of type hints
Browse files Browse the repository at this point in the history
  • Loading branch information
SimonMolinsky committed Mar 25, 2023
1 parent 1c32867 commit 4a21c54
Show file tree
Hide file tree
Showing 2 changed files with 101 additions and 59 deletions.
3 changes: 0 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,5 @@ tmp/
.DS_Store
.nox
__pycache__
<<<<<<< Updated upstream
*notes-from-review.md
=======
*.idea*
>>>>>>> Stashed changes
Original file line number Diff line number Diff line change
Expand Up @@ -211,15 +211,15 @@ def add_me(aNum, aNum2):

## Beyond docstrings: type hints

We can use docstrings to describe data types that we pass into functions as parameters or
into classes as attributes. We do it with package users in mind.
We use docstrings to describe data types that we pass into functions as parameters or
into classes as attributes. *We do it with our users in mind.*

What with us – developers? We can think of ourselves and the new contributors
**What with us – developers?** We can think of ourselves and the new contributors,
and start using *type hinting* to make our journey safer!

There are solid reasons why to use type hints:

- Development and debugging is faster,
- Development and debugging are faster,
- We clearly see data flow and its transformations,
- We can use tools like `mypy` or integrated tools of Python IDEs for static type checking and code debugging.

Expand All @@ -230,22 +230,22 @@ The icing on the cake is that the code in our package will be aligned with the b
But there are reasons to *skip* type hinting:

- Type hints may make code unreadable, especially when a parameter’s input takes multiple data types and we list them all,
- It doesn’t make sense to write type hints for simple scripts and functions that perform obvious operations.
- Writing type hints for simple scripts and functions that perform obvious operations don't make sense.

Fortunately for us, type hinting is not all black and white.
We can gradually describe the parameters and outputs of some functions but leave others as they are.
Type hinting can be an introductory task for new contributors in seasoned packages,
that way their learning curve about data flow and dependencies between API endpoints will be smoother.
Type hinting can be a task for new contributors to get them used to the package structure.
That way, their learning curve about data flow and dependencies between API endpoints will be smoother.

## Type hints in practice

Type hinting was introduced with Python 3.5 and is described in [PEP 484](https://peps.python.org/pep-0484/).
**PEP 484** defines the scope of type hinting. Is Python drifting towards compiled languages with type hinting?
It is not. Type hints are optional and static and they will work like that in the future where Python is Python.
The power of type hints lies somewhere between docstrings and unit tests, and with it we can avoid many bugs
**PEP 484** defines the scope of type hinting. Is Python drifting towards compiled languages with this feature?
It is not. Type hints are optional and static. They will work like that in the future until Python is Python.
The power of type hints lies somewhere between docstrings and unit tests, and with it, we can avoid many bugs
throughout development.

We've seen type hints in the simple example earlier, and we will change it slightly:
We've seen type hints in the simple example earlier. Let's come back to it and change it slightly:


```python
Expand All @@ -254,23 +254,28 @@ from typing import Dict, List

def extent_to_json(ext_obj: List) -> Dict:
"""Convert bounds to a shapely geojson like spatial object."""
pass
...

```

Here we focus on the new syntax. First, we have described the parameter `ext_obj` as the `List` class. How do we do it? By adding a colon after parameter and the name of a class that is passed into a function. It’s not over and we see that the function definition after closing parenthesis is expanded. If we want to inform type checker what type function returns, then we create the arrow sign `->` that points to a returned type and after it we have function’s colon. Our function returns Python dictonray (`Dict`).
Here we focus on the new syntax. First, we described the parameter `ext_obj` as the `List` class. How do we do it?
Add a colon after the parameter (variable) and the name of a class that is passed into a function.
It’s not over. Do you see, that the function definition after closing parenthesis is expanded?
If we want to inform the type checker what the function returns, then we create the arrow sign `->` that points to a returned type,
and after it, we put the function’s colon. Our function returns a Python dictionary (`Dict`).

```{note}
We have exported classes `List` and `Dict` from `typing` module but we may use
We have exported classes `List` and `Dict` from the `typing` module, but we may use
`list` or `dict` keywords instead. We will achieve the same result.
Capitalized keywords are required when our package uses Python versions that are lower than
Python 3.9. Python 3.7 will be deprecated in June 2023, Python 3.8 in October 2024.
Thus, if your package supports the whole ecosystem, it should use `typing` module syntax.
Python 3.9. Python 3.7 will be deprecated in June 2023, and Python 3.8 in October 2024.
Thus, if your package supports the whole ecosystem, it should use the `typing` module syntax.
```

### Type hints: basic example

The best way to learn is by example. We will use the [pystiche](https://github.com/pystiche/pystiche/tree/main) package.
To avoid confusion, we start from a mathematical operation with basic data types:
To avoid confusion, we start with a mathematical operation:

```python
import torch
Expand All @@ -283,7 +288,7 @@ def _norm(x: torch.Tensor, dim: int = 1, eps: float = 1e-8) -> torch.Tensor:

The function has three parameters:

- `x` that is required and its type is `torch.Tensor`,
- `x` that is required, and its type is `torch.Tensor`,
- `dim`, optional `int` with a default value equal to `1`,
- `eps`, optional `float` with a default value equal to `1e-8`.

Expand All @@ -295,7 +300,7 @@ As we see, we can use basic data types to mark simple variables. The basic set o
- `bool`
- `complex`.

Most frequently we will use those types within a simple functions that are *close to data*.
We will most frequently use those types within simple functions that are *close to data*.
However, sometimes our variable will be a data structure that isn't built-in within Python itself
but comes from other packages:

Expand All @@ -304,14 +309,14 @@ but comes from other packages:
- `DataFrame` from `pandas`,
- `Session` from `requests`.

To perform type checking we must import those classes, then we can set those as a parameter's type.
To perform type checking, we must import those classes. Then we can set those as a parameter's type.
The same is true if we want to use classes from within our package (but we should avoid **circular imports**,
the topic that we will uncover later).
the topic we will uncover later).

### Type hints: complex data types

We can use type hints to describe other objects available in Python.
The little sample of those objects are:
A little sample of those objects are:

- `List` (= `list`)
- `Dict` (= `dict`)
Expand All @@ -330,13 +335,13 @@ def _extract_prev(self, idx: int, idcs: List[int]) -> Optional[str]:

```

The function has two parameters. Parameter `idcs` is a list of integers. We may write it as `List[int]` or `List` without
The function has two parameters. The parameter `idcs` is a list of integers. We may write it as `List[int]` or `List` without
square brackets and data type that is within a list.

The `_extract_prev` function returns `Optional` type. It is a special type that is used to describe inputs and output
The `_extract_prev` function returns the `Optional` type. It is a special type that describes inputs and output
that can be `None`. There are more interesting types that we can use in our code:

- `Union` – we can use it to describe a variable that can be of multiple types, the common example could be:
- `Union` – we can use it to describe a variable of multiple types. An example could be:

```python
from typing import List, Union
Expand All @@ -349,8 +354,8 @@ def process_data(data: Union[np.ndarray, pd.DataFrame, List]) -> np.ndarray:

```

What's the problem with the example above? With more data types that can be passed into parameter `data`, the function definition
becomes unreadable. We have two solutions for this issue. The first one is to use `Any` type that is a wildcard type:
What's the problem with the example above? The function definition becomes unreadable with more data types passed into the parameter `data`.
We have two solutions for this issue. The first one is to use the `Any` type, which is a wildcard that is equal to not passing any type.

```python
from typing import Any
Expand All @@ -361,15 +366,15 @@ def process_data(data: Any) -> np.ndarray:

```

The second solution is to think what is a high level representation of passed data types. The examples are:
The second solution is to think what is a high-level representation of a passed data type. The examples are:

- `Sequence` – we can use it to describe a variable that is a sequence of elements. Sequential are `list`, `tuple`, `range` and `str`.
- `Iterable` – we can use it to describe a variable that is iterable. Iterables are `list`, `tuple`, `range`, `str`, `dict` and `set`.
- `Sequence` – we can use it to describe a variable as a sequence of elements. Sequential are `list`, `tuple`, `range` and `str`.
- `Iterable` – we can use it to describe an iterable variable. Iterables are `list`, `tuple`, `range`, `str`, `dict` and `set`.
- `Mapping` – we can use it to describe a variable that is a mapping. Mappings are `dict` and `defaultdict`.
- `Hashable` – we can use it to describe a variable that is hashable. Hashables are `int`, `float`, `str`, `tuple` and `frozenset`.
- `Collection` - we can use it to describe a variable that is a collection. Collections are `list`, `tuple`, `range`, `str`, `dict`, `set` and `frozenset`.
- `Hashable` – we can use it to describe a hashable variable. Hashables are `int`, `float`, `str`, `tuple` and `frozenset`.
- `Collection` - we can use it to describe a collection variable. Collections are `list`, `tuple`, `range`, `str`, `dict`, `set` and `frozenset`.

Thus, the function could look like:
Thus, the function could look like this:

```python
from typing import Iterable
Expand All @@ -380,11 +385,11 @@ def process_data(data: Iterable) -> np.ndarray:

```

### Type hints: special typing objects
### Type hints: unique objects and interesting cases

The `typing` module provides us with more objects that we can use to describe our variables.
Interesting object is `Callable` that we can use to describe a variable that is a function. Usually,
when we write decorators or wrappers, we use `Callable` type. The example in the context of `pystiche` package:
An interesting object is `Callable` that we can use to describe a variable that is a function. Usually,
when we write decorators or wrappers, we use the `Callable` type. The example in the context of the `pystiche` package:

```python
from typing import Callable
Expand All @@ -393,14 +398,13 @@ from typing import Callable
def _deprecate(fn: Callable) -> Callable:
...


```

The `Callable`can be used as a single word or as a word with square brackets that has two parameters: `Callable[[arg1, arg2], return_type]`.
The first parameter is a list of arguments, the second one is a return type.
The `Callable`can be used as a single word or a word with square brackets with two parameters: `Callable[[arg1, arg2], return_type]`.
The first parameter is a list of arguments, and the second is a function output's data type.

There is one more important case around type hints. Sometimes we want to describe a variable that comes from within
our package. Usually we can do it without any problems:
There is an important case around type hints. Sometimes we want to describe a variable that comes from within
our package. Usually, we can do it without problems:

```python
from my_package import my_data_class
Expand All @@ -411,10 +415,10 @@ def my_function(data: my_data_class) -> None:

```

and it will work fine. But we may encounter *circual imports* that are a problem. What is a *circular import*?
It is a case when we want to import module B into module A but module A is already imported into module B.
It seems like we are importing the same module twice into itself. The issue is rare when we program without type
hinting. However, with type hints it could be tedious.
And it will work fine. But we may encounter *circular imports* that need to be fixed. What is a *circular import*?
It is a case when we want to import module B into module A, but module A is already imported into module B.
We are importing the same module into itself. The issue is rare when we program without type
hinting. However, with type hints, it could be tedious.

Thus, if you encounter this error:

Expand All @@ -431,12 +435,13 @@ def my_function(data: my_data_class) -> None:
ImportError: cannot import name 'my_data_class' from partially initialized module 'my_package' (most likely due to a circular import) (/home/user/my_package/__init__.py)
```

Then you should use `typing.TYPE_CHECKING` clause to avoid circular imports. The example:
Then you should use the `typing.TYPE_CHECKING` clause to avoid circular imports. The example:

```python
from __future__ import annotations
from typing import TYPE_CHECKING


if TYPE_CHECKING:
from my_package import my_data_class

Expand All @@ -446,18 +451,58 @@ def my_function(data: my_data_class) -> None:

```

Unfortunately, the solution is dirty because we have to
use `if TYPE_CHECKING` clause and `from __future__ import annotations` import to make it work! Type hinting
is not only roses and butterflies!
Unfortunately, the solution is *dirty* because we have to
use the `if TYPE_CHECKING` clause and `from __future__ import annotations` import to make it work. It make our
script messier! Type hinting is not only the roses and butterflies!

The nice feature of type hinting is that we can define variable's type within a function:

```python
from typing import Dict
import numpy as np


def validate_model_input(data: np.ndarray) -> Dict:
"""
Function checks if dataset has enough records to perform modeling.
Parameters
----------
data : np.ndarray
Input data.
Returns
-------
: Dict
Dictionary with `data`, `info` and `status` to decide if pipeline can proceed with modeling.
"""

output: Dict = None # type hinting

# Probably we don't have the lines below yet

# if data.shape[0] > 50:
# output = {"data": data, "info": "Dataset is big enough for statistical tests.", "status": True}
# else:
# output = {"data": data, "info": "Dataset is too small for statistical tests.", "status": False}

return output

```

We will use this feature rarely. The most probable scenario is when we start defining a function and its output, but
we don't know how we will process data. In this context, we can still run type checking to be sure that the
function behaves as we expect within the newly designed pipeline.

### Type hinting: final remarks and tools
(Another scenario: we will be forced to add type hints to silence dynamic type checkers from some IDEs ;) ).

There are few tools designed for static type checking. The most popular one is [`mypy`](https://mypy.readthedocs.io/en/stable/).
It's a good idea to add it to your Continuous Integration (CI) pipeline.
Other tools are integrated with popular IDEs like `PyCharm` or `VSCode`, most of them are based on `mypy` logic.

At this point, we have a good understanding of type hints and how to use them in our code. There is one last thing to
remember. **Type hints are not required in all our functions and we can introduce those gradually, it won't damage our code**.
It is very convenient way of using this extraordinary feature!
### Type hinting: final remarks

There are tools designed for static type checking. The most popular one is [`mypy`](https://mypy.readthedocs.io/en/stable/).
Adding it to your Continuous Integration (CI) pipeline is a good idea.
Other tools are integrated with popular IDEs like `PyCharm` or `VSCode`; most are based on `mypy` logic.

The last thing to remember is that **type hints are optional in all our functions, and we can introduce them gradually,
which won't damage our code and output generated by CI type checking tools**.
It is a very convenient way of using this extraordinary feature!

0 comments on commit 4a21c54

Please sign in to comment.