diff --git a/.gitignore b/.gitignore index 54529bf6..49fb9bc9 100644 --- a/.gitignore +++ b/.gitignore @@ -5,8 +5,5 @@ tmp/ .DS_Store .nox __pycache__ -<<<<<<< Updated upstream *notes-from-review.md -======= *.idea* ->>>>>>> Stashed changes diff --git a/documentation/write-user-documentation/document-your-code-api-docstrings.md b/documentation/write-user-documentation/document-your-code-api-docstrings.md index fb6bc7e0..3607f3e2 100644 --- a/documentation/write-user-documentation/document-your-code-api-docstrings.md +++ b/documentation/write-user-documentation/document-your-code-api-docstrings.md @@ -211,15 +211,15 @@ def add_me(aNum, aNum2): ## Beyond docstrings: type hints -We can use docstrings to describe data types that we pass into functions as parameters or -into classes as attributes. We do it with package users in mind. +We use docstrings to describe data types that we pass into functions as parameters or +into classes as attributes. *We do it with our users in mind.* -What with us – developers? We can think of ourselves and the new contributors +**What with us – developers?** We can think of ourselves and the new contributors, and start using *type hinting* to make our journey safer! There are solid reasons why to use type hints: -- Development and debugging is faster, +- Development and debugging are faster, - We clearly see data flow and its transformations, - We can use tools like `mypy` or integrated tools of Python IDEs for static type checking and code debugging. @@ -230,22 +230,22 @@ The icing on the cake is that the code in our package will be aligned with the b But there are reasons to *skip* type hinting: - Type hints may make code unreadable, especially when a parameter’s input takes multiple data types and we list them all, -- It doesn’t make sense to write type hints for simple scripts and functions that perform obvious operations. +- Writing type hints for simple scripts and functions that perform obvious operations don't make sense. Fortunately for us, type hinting is not all black and white. We can gradually describe the parameters and outputs of some functions but leave others as they are. -Type hinting can be an introductory task for new contributors in seasoned packages, -that way their learning curve about data flow and dependencies between API endpoints will be smoother. +Type hinting can be a task for new contributors to get them used to the package structure. +That way, their learning curve about data flow and dependencies between API endpoints will be smoother. ## Type hints in practice Type hinting was introduced with Python 3.5 and is described in [PEP 484](https://peps.python.org/pep-0484/). -**PEP 484** defines the scope of type hinting. Is Python drifting towards compiled languages with type hinting? -It is not. Type hints are optional and static and they will work like that in the future where Python is Python. -The power of type hints lies somewhere between docstrings and unit tests, and with it we can avoid many bugs +**PEP 484** defines the scope of type hinting. Is Python drifting towards compiled languages with this feature? +It is not. Type hints are optional and static. They will work like that in the future until Python is Python. +The power of type hints lies somewhere between docstrings and unit tests, and with it, we can avoid many bugs throughout development. -We've seen type hints in the simple example earlier, and we will change it slightly: +We've seen type hints in the simple example earlier. Let's come back to it and change it slightly: ```python @@ -254,23 +254,28 @@ from typing import Dict, List def extent_to_json(ext_obj: List) -> Dict: """Convert bounds to a shapely geojson like spatial object.""" - pass + ... + ``` -Here we focus on the new syntax. First, we have described the parameter `ext_obj` as the `List` class. How do we do it? By adding a colon after parameter and the name of a class that is passed into a function. It’s not over and we see that the function definition after closing parenthesis is expanded. If we want to inform type checker what type function returns, then we create the arrow sign `->` that points to a returned type and after it we have function’s colon. Our function returns Python dictonray (`Dict`). +Here we focus on the new syntax. First, we described the parameter `ext_obj` as the `List` class. How do we do it? +Add a colon after the parameter (variable) and the name of a class that is passed into a function. +It’s not over. Do you see, that the function definition after closing parenthesis is expanded? +If we want to inform the type checker what the function returns, then we create the arrow sign `->` that points to a returned type, +and after it, we put the function’s colon. Our function returns a Python dictionary (`Dict`). ```{note} -We have exported classes `List` and `Dict` from `typing` module but we may use +We have exported classes `List` and `Dict` from the `typing` module, but we may use `list` or `dict` keywords instead. We will achieve the same result. Capitalized keywords are required when our package uses Python versions that are lower than -Python 3.9. Python 3.7 will be deprecated in June 2023, Python 3.8 in October 2024. -Thus, if your package supports the whole ecosystem, it should use `typing` module syntax. +Python 3.9. Python 3.7 will be deprecated in June 2023, and Python 3.8 in October 2024. +Thus, if your package supports the whole ecosystem, it should use the `typing` module syntax. ``` ### Type hints: basic example The best way to learn is by example. We will use the [pystiche](https://github.com/pystiche/pystiche/tree/main) package. -To avoid confusion, we start from a mathematical operation with basic data types: +To avoid confusion, we start with a mathematical operation: ```python import torch @@ -283,7 +288,7 @@ def _norm(x: torch.Tensor, dim: int = 1, eps: float = 1e-8) -> torch.Tensor: The function has three parameters: -- `x` that is required and its type is `torch.Tensor`, +- `x` that is required, and its type is `torch.Tensor`, - `dim`, optional `int` with a default value equal to `1`, - `eps`, optional `float` with a default value equal to `1e-8`. @@ -295,7 +300,7 @@ As we see, we can use basic data types to mark simple variables. The basic set o - `bool` - `complex`. -Most frequently we will use those types within a simple functions that are *close to data*. +We will most frequently use those types within simple functions that are *close to data*. However, sometimes our variable will be a data structure that isn't built-in within Python itself but comes from other packages: @@ -304,14 +309,14 @@ but comes from other packages: - `DataFrame` from `pandas`, - `Session` from `requests`. -To perform type checking we must import those classes, then we can set those as a parameter's type. +To perform type checking, we must import those classes. Then we can set those as a parameter's type. The same is true if we want to use classes from within our package (but we should avoid **circular imports**, -the topic that we will uncover later). +the topic we will uncover later). ### Type hints: complex data types We can use type hints to describe other objects available in Python. -The little sample of those objects are: +A little sample of those objects are: - `List` (= `list`) - `Dict` (= `dict`) @@ -330,13 +335,13 @@ def _extract_prev(self, idx: int, idcs: List[int]) -> Optional[str]: ``` -The function has two parameters. Parameter `idcs` is a list of integers. We may write it as `List[int]` or `List` without +The function has two parameters. The parameter `idcs` is a list of integers. We may write it as `List[int]` or `List` without square brackets and data type that is within a list. -The `_extract_prev` function returns `Optional` type. It is a special type that is used to describe inputs and output +The `_extract_prev` function returns the `Optional` type. It is a special type that describes inputs and output that can be `None`. There are more interesting types that we can use in our code: -- `Union` – we can use it to describe a variable that can be of multiple types, the common example could be: +- `Union` – we can use it to describe a variable of multiple types. An example could be: ```python from typing import List, Union @@ -349,8 +354,8 @@ def process_data(data: Union[np.ndarray, pd.DataFrame, List]) -> np.ndarray: ``` -What's the problem with the example above? With more data types that can be passed into parameter `data`, the function definition -becomes unreadable. We have two solutions for this issue. The first one is to use `Any` type that is a wildcard type: +What's the problem with the example above? The function definition becomes unreadable with more data types passed into the parameter `data`. +We have two solutions for this issue. The first one is to use the `Any` type, which is a wildcard that is equal to not passing any type. ```python from typing import Any @@ -361,15 +366,15 @@ def process_data(data: Any) -> np.ndarray: ``` -The second solution is to think what is a high level representation of passed data types. The examples are: +The second solution is to think what is a high-level representation of a passed data type. The examples are: -- `Sequence` – we can use it to describe a variable that is a sequence of elements. Sequential are `list`, `tuple`, `range` and `str`. -- `Iterable` – we can use it to describe a variable that is iterable. Iterables are `list`, `tuple`, `range`, `str`, `dict` and `set`. +- `Sequence` – we can use it to describe a variable as a sequence of elements. Sequential are `list`, `tuple`, `range` and `str`. +- `Iterable` – we can use it to describe an iterable variable. Iterables are `list`, `tuple`, `range`, `str`, `dict` and `set`. - `Mapping` – we can use it to describe a variable that is a mapping. Mappings are `dict` and `defaultdict`. -- `Hashable` – we can use it to describe a variable that is hashable. Hashables are `int`, `float`, `str`, `tuple` and `frozenset`. -- `Collection` - we can use it to describe a variable that is a collection. Collections are `list`, `tuple`, `range`, `str`, `dict`, `set` and `frozenset`. +- `Hashable` – we can use it to describe a hashable variable. Hashables are `int`, `float`, `str`, `tuple` and `frozenset`. +- `Collection` - we can use it to describe a collection variable. Collections are `list`, `tuple`, `range`, `str`, `dict`, `set` and `frozenset`. -Thus, the function could look like: +Thus, the function could look like this: ```python from typing import Iterable @@ -380,11 +385,11 @@ def process_data(data: Iterable) -> np.ndarray: ``` -### Type hints: special typing objects +### Type hints: unique objects and interesting cases The `typing` module provides us with more objects that we can use to describe our variables. -Interesting object is `Callable` that we can use to describe a variable that is a function. Usually, -when we write decorators or wrappers, we use `Callable` type. The example in the context of `pystiche` package: +An interesting object is `Callable` that we can use to describe a variable that is a function. Usually, +when we write decorators or wrappers, we use the `Callable` type. The example in the context of the `pystiche` package: ```python from typing import Callable @@ -393,14 +398,13 @@ from typing import Callable def _deprecate(fn: Callable) -> Callable: ... - ``` -The `Callable`can be used as a single word or as a word with square brackets that has two parameters: `Callable[[arg1, arg2], return_type]`. -The first parameter is a list of arguments, the second one is a return type. +The `Callable`can be used as a single word or a word with square brackets with two parameters: `Callable[[arg1, arg2], return_type]`. +The first parameter is a list of arguments, and the second is a function output's data type. -There is one more important case around type hints. Sometimes we want to describe a variable that comes from within -our package. Usually we can do it without any problems: +There is an important case around type hints. Sometimes we want to describe a variable that comes from within +our package. Usually, we can do it without problems: ```python from my_package import my_data_class @@ -411,10 +415,10 @@ def my_function(data: my_data_class) -> None: ``` -and it will work fine. But we may encounter *circual imports* that are a problem. What is a *circular import*? -It is a case when we want to import module B into module A but module A is already imported into module B. -It seems like we are importing the same module twice into itself. The issue is rare when we program without type -hinting. However, with type hints it could be tedious. +And it will work fine. But we may encounter *circular imports* that need to be fixed. What is a *circular import*? +It is a case when we want to import module B into module A, but module A is already imported into module B. +We are importing the same module into itself. The issue is rare when we program without type +hinting. However, with type hints, it could be tedious. Thus, if you encounter this error: @@ -431,12 +435,13 @@ def my_function(data: my_data_class) -> None: ImportError: cannot import name 'my_data_class' from partially initialized module 'my_package' (most likely due to a circular import) (/home/user/my_package/__init__.py) ``` -Then you should use `typing.TYPE_CHECKING` clause to avoid circular imports. The example: +Then you should use the `typing.TYPE_CHECKING` clause to avoid circular imports. The example: ```python from __future__ import annotations from typing import TYPE_CHECKING + if TYPE_CHECKING: from my_package import my_data_class @@ -446,18 +451,58 @@ def my_function(data: my_data_class) -> None: ``` -Unfortunately, the solution is dirty because we have to -use `if TYPE_CHECKING` clause and `from __future__ import annotations` import to make it work! Type hinting -is not only roses and butterflies! +Unfortunately, the solution is *dirty* because we have to +use the `if TYPE_CHECKING` clause and `from __future__ import annotations` import to make it work. It make our +script messier! Type hinting is not only the roses and butterflies! + +The nice feature of type hinting is that we can define variable's type within a function: + +```python +from typing import Dict +import numpy as np + + +def validate_model_input(data: np.ndarray) -> Dict: + """ + Function checks if dataset has enough records to perform modeling. + + Parameters + ---------- + data : np.ndarray + Input data. + + Returns + ------- + : Dict + Dictionary with `data`, `info` and `status` to decide if pipeline can proceed with modeling. + """ + + output: Dict = None # type hinting + + # Probably we don't have the lines below yet + + # if data.shape[0] > 50: + # output = {"data": data, "info": "Dataset is big enough for statistical tests.", "status": True} + # else: + # output = {"data": data, "info": "Dataset is too small for statistical tests.", "status": False} + + return output + +``` + +We will use this feature rarely. The most probable scenario is when we start defining a function and its output, but +we don't know how we will process data. In this context, we can still run type checking to be sure that the +function behaves as we expect within the newly designed pipeline. -### Type hinting: final remarks and tools +(Another scenario: we will be forced to add type hints to silence dynamic type checkers from some IDEs ;) ). -There are few tools designed for static type checking. The most popular one is [`mypy`](https://mypy.readthedocs.io/en/stable/). -It's a good idea to add it to your Continuous Integration (CI) pipeline. -Other tools are integrated with popular IDEs like `PyCharm` or `VSCode`, most of them are based on `mypy` logic. -At this point, we have a good understanding of type hints and how to use them in our code. There is one last thing to -remember. **Type hints are not required in all our functions and we can introduce those gradually, it won't damage our code**. -It is very convenient way of using this extraordinary feature! +### Type hinting: final remarks +There are tools designed for static type checking. The most popular one is [`mypy`](https://mypy.readthedocs.io/en/stable/). +Adding it to your Continuous Integration (CI) pipeline is a good idea. +Other tools are integrated with popular IDEs like `PyCharm` or `VSCode`; most are based on `mypy` logic. +The last thing to remember is that **type hints are optional in all our functions, and we can introduce them gradually, +which won't damage our code and output generated by CI type checking tools**. +It is a very convenient way of using this extraordinary feature!