Skip to content

Commit

Permalink
stubgen: unify C extension and pure python stub generators with objec…
Browse files Browse the repository at this point in the history
…t oriented design (#15770)

This MR is a major overhaul to `stubgen`. It has been tested extensively
in the process of creating stubs for multiple large and varied libraries
(detailed below).

## User story

The impetus of this change is as follows: as a maintainer of third-party
stubs I do _not_ want to use `stubgen` as a starting point for
hand-editing stub files, I want a framework to regenerate stubs against
upstream changes to a library.

## Summary of Changes

- Introduces an object-oriented design for C extension stub generation,
including a common base class that is shared between inspection-based
and parsing-based stub generation.
- Generally unifies and harmonizes the behavior between inspection and
parsing approaches. For example, function formatting, import tracking,
signature generators, and attribute filtering are now handled with the
same code.
- Adds support for `--include-private` and `--export-less` to
c-extensions (inspection-based generation).
- Adds support for force enabling inspection-based stub generation (the
approach used for C extensions) on pure python code using a new
`--inspect-mode` flag. Useful for packages that employ dynamic function
or class factories. Also makes it possible to generate stubs for
pyc-only modules (yes, this is a real use case)
- Adds an alias `--no-analysis` for `--parse-only` to clarify the
purpose of this option.
- Removes filtering of `__version__` attribute from modules: I've
encountered a number of cases in real-world code that utilize this
attribute.
- Adds a number of tests for inspection mode. Even though these run on
pure python code they increase coverage of the C extension code since it
shares much of hte same code base.

Below I've compiled some basic information about each stub library that
I've created using my changes, and a link to the specialized code for
procedurally generating the stubs.

| Library | code type | other notes | 
| --- | --- | --- | 
|
[USD](https://github.com/LumaPictures/cg-stubs/blob/master/usd/stubgen_usd.py)
| boost-python | integrates types from doxygen |
|
[katana](https://github.com/LumaPictures/cg-stubs/blob/master/katana/stubgen_katana.py)
| pyc and C extensions | uses epydoc docstrings. has pyi-only packages |
|
[mari](https://github.com/LumaPictures/cg-stubs/blob/master/mari/stubgen_mari.py)
| pure python and C extensions | uses epydoc docstrings |
|
[opencolorio](https://github.com/LumaPictures/cg-stubs/blob/master/ocio/stubgen_ocio.py)
| pybind11 | |
|
[pyside2](https://github.com/LumaPictures/cg-stubs/blob/master/pyside/stubgen_pyside.py)
| shiboken | |
| substance_painter | pure python | basic / non-custom. reads types from
annotations |
| pymel | pure python | integrates types parsed from custom docs |

I know that this is a pretty big PR, and I know it's a lot to go
through, but I've spent a huge amount of time on it and I believe this
makes mypy's stubgen tool the absolute best available. If it helps, I
also have 13 merged mypy PRs under my belt and I'll be around to fix any
issues if they come up.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <[email protected]>
  • Loading branch information
3 people committed Oct 15, 2023
1 parent ff9deb3 commit e435594
Show file tree
Hide file tree
Showing 12 changed files with 2,125 additions and 1,442 deletions.
14 changes: 12 additions & 2 deletions docs/source/stubgen.rst
Original file line number Diff line number Diff line change
Expand Up @@ -127,12 +127,22 @@ alter the default behavior:
unwanted side effects, such as the running of tests. Stubgen tries to skip test
modules even without this option, but this does not always work.

.. option:: --parse-only
.. option:: --no-analysis

Don't perform semantic analysis of source files. This may generate
worse stubs -- in particular, some module, class, and function aliases may
be represented as variables with the ``Any`` type. This is generally only
useful if semantic analysis causes a critical mypy error.
useful if semantic analysis causes a critical mypy error. Does not apply to
C extension modules. Incompatible with :option:`--inspect-mode`.

.. option:: --inspect-mode

Import and inspect modules instead of parsing source code. This is the default
behavior for C modules and pyc-only packages. The flag is useful to force
inspection for pure Python modules that make use of dynamically generated
members that would otherwise be omitted when using the default behavior of
code parsing. Implies :option:`--no-analysis` as analysis requires source
code.

.. option:: --doc-dir PATH

Expand Down
4 changes: 4 additions & 0 deletions mypy/moduleinspect.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ def is_c_module(module: ModuleType) -> bool:
return os.path.splitext(module.__dict__["__file__"])[-1] in [".so", ".pyd", ".dll"]


def is_pyc_only(file: str | None) -> bool:
return bool(file and file.endswith(".pyc") and not os.path.exists(file[:-1]))


class InspectError(Exception):
pass

Expand Down
100 changes: 90 additions & 10 deletions mypy/stubdoc.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,14 @@

import contextlib
import io
import keyword
import re
import tokenize
from typing import Any, Final, MutableMapping, MutableSequence, NamedTuple, Sequence, Tuple
from typing_extensions import TypeAlias as _TypeAlias

import mypy.util

# Type alias for signatures strings in format ('func_name', '(arg, opt_arg=False)').
Sig: _TypeAlias = Tuple[str, str]

Expand All @@ -35,12 +38,16 @@ class ArgSig:

def __init__(self, name: str, type: str | None = None, default: bool = False):
self.name = name
if type and not is_valid_type(type):
raise ValueError("Invalid type: " + type)
self.type = type
# Does this argument have a default value?
self.default = default

def is_star_arg(self) -> bool:
return self.name.startswith("*") and not self.name.startswith("**")

def is_star_kwarg(self) -> bool:
return self.name.startswith("**")

def __repr__(self) -> str:
return "ArgSig(name={}, type={}, default={})".format(
repr(self.name), repr(self.type), repr(self.default)
Expand All @@ -59,7 +66,80 @@ def __eq__(self, other: Any) -> bool:
class FunctionSig(NamedTuple):
name: str
args: list[ArgSig]
ret_type: str
ret_type: str | None

def is_special_method(self) -> bool:
return bool(
self.name.startswith("__")
and self.name.endswith("__")
and self.args
and self.args[0].name in ("self", "cls")
)

def has_catchall_args(self) -> bool:
"""Return if this signature has catchall args: (*args, **kwargs)"""
if self.args and self.args[0].name in ("self", "cls"):
args = self.args[1:]
else:
args = self.args
return (
len(args) == 2
and all(a.type in (None, "object", "Any", "typing.Any") for a in args)
and args[0].is_star_arg()
and args[1].is_star_kwarg()
)

def is_catchall_signature(self) -> bool:
"""Return if this signature is the catchall identity: (*args, **kwargs) -> Any"""
return self.has_catchall_args() and self.ret_type in (None, "Any", "typing.Any")

def format_sig(
self,
indent: str = "",
is_async: bool = False,
any_val: str | None = None,
docstring: str | None = None,
) -> str:
args: list[str] = []
for arg in self.args:
arg_def = arg.name

if arg_def in keyword.kwlist:
arg_def = "_" + arg_def

if (
arg.type is None
and any_val is not None
and arg.name not in ("self", "cls")
and not arg.name.startswith("*")
):
arg_type: str | None = any_val
else:
arg_type = arg.type
if arg_type:
arg_def += ": " + arg_type
if arg.default:
arg_def += " = ..."

elif arg.default:
arg_def += "=..."

args.append(arg_def)

retfield = ""
ret_type = self.ret_type if self.ret_type else any_val
if ret_type is not None:
retfield = " -> " + ret_type

prefix = "async " if is_async else ""
sig = "{indent}{prefix}def {name}({args}){ret}:".format(
indent=indent, prefix=prefix, name=self.name, args=", ".join(args), ret=retfield
)
if docstring:
suffix = f"\n{indent} {mypy.util.quote_docstring(docstring)}"
else:
suffix = " ..."
return f"{sig}{suffix}"


# States of the docstring parser.
Expand Down Expand Up @@ -176,17 +256,17 @@ def add_token(self, token: tokenize.TokenInfo) -> None:

# arg_name is empty when there are no args. e.g. func()
if self.arg_name:
try:
if self.arg_type and not is_valid_type(self.arg_type):
# wrong type, use Any
self.args.append(
ArgSig(name=self.arg_name, type=None, default=bool(self.arg_default))
)
else:
self.args.append(
ArgSig(
name=self.arg_name, type=self.arg_type, default=bool(self.arg_default)
)
)
except ValueError:
# wrong type, use Any
self.args.append(
ArgSig(name=self.arg_name, type=None, default=bool(self.arg_default))
)
self.arg_name = ""
self.arg_type = None
self.arg_default = None
Expand Down Expand Up @@ -240,7 +320,7 @@ def args_kwargs(signature: FunctionSig) -> bool:


def infer_sig_from_docstring(docstr: str | None, name: str) -> list[FunctionSig] | None:
"""Convert function signature to list of TypedFunctionSig
"""Convert function signature to list of FunctionSig
Look for function signatures of function in docstring. Signature is a string of
the format <function_name>(<signature>) -> <return type> or perhaps without
Expand Down
Loading

0 comments on commit e435594

Please sign in to comment.