Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding stubs for ruamel.yaml #12584

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Adding stubs for ruamel.yaml #12584

wants to merge 16 commits into from

Conversation

stevenlele
Copy link
Contributor

@stevenlele stevenlele commented Aug 24, 2024

ruamel.yaml is a fork of PyYAML which features a new API and can perform round-trip conversion preserving the original format and comments. It also claims to have fixed various issues when the development of PyYAML was inactive.

Typeshed testing depends on this package via pre-commit-hooks.

Repository SourceForge GitHub mirror
ruamel.yaml https://sourceforge.net/p/ruamel-yaml/ https://github.com/commx/ruamel-yaml
ruamel.yaml.clib https://sourceforge.net/p/ruamel-yaml-clib/ https://github.com/ruamel/yaml.clib
Test data https://sourceforge.net/p/ruamel-yaml-data/

Although the package ships a py.typed file, almost every type hint is Any, which makes coding experience terrible. This falls into the situation mentioned in #11955, which might need some discussion.

The typing information in the package seems to originate from https://sourceforge.net/p/ruamel-yaml/tickets/42/ (from https://github.com/common-workflow-language/schema_salad) in 2016 when things were still in Python 2. Then it was never properly maintained and became a pile of Anys.

This package doesn't follow the best practice of prefixing private properties and methods with underscore(s), which makes writing stubs a lot more complex due to the large number of "public" properties and methods.

It's also worth noting that the documentation isn't well-structured, nor does it cover necessary information like every format option, which makes it even harder to use this library. Adding proper typing information might provide some help.

Since there are already stubs for PyYAML, many modules of ruamel.yaml can be added by referring to the corresponding ones in PyYAML. They turned out to be not very helpful.

A special case in this library is the YAML class, which you need to pass a typ argument to select its parsing mode. Depending on the typ, the methods of YAML behave differently. For example, YAML(typ='rt').seq() returns a CommentedSeq with round-trip-specific methods, while YAML(typ='safe') returns a plain list. To make coding easier, I added pseudo-subclasses of YAML (like _RoundTripYAML) and converted the __init__() method into multiple @overload def __new__() methods which returns different variants based on the typ parameter.

There's also an undocumented rtsc (round-trip split comments) type in the library, but the parsers seem to be very experimental and less-maintained, so I intentionally omitted it in the main module. I found the only usage at https://github.com/SoulMelody/LibreSVIP/blob/main/libresvip/utils/yamlutils/__init__.py#L15 which doesn't seem to be intentional or necessary. Edit: The parser doesn't seem to be working at all on a YAML document with comments.

Just for reference, I found the one and only ruamel.yaml plug-in code example at https://github.com/dstl/Stone-Soup/blob/main/stonesoup/serialise.py which might help understanding how that plug_in parameter works.

MonkeyType provides great help.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@stevenlele stevenlele marked this pull request as ready for review September 15, 2024 21:50
@stevenlele stevenlele changed the title [WIP] Adding stubs for ruamel.yaml Adding stubs for ruamel.yaml Sep 15, 2024
@stevenlele
Copy link
Contributor Author

Currently stubtest fails to run:

error: not checking stubs due to mypy build errors:
/tmp/tmps3q1g0pn/lib/python3.11/site-packages/ruamel/yaml/main.py:47: error: Cannot assign to a type  [misc]
/tmp/tmps3q1g0pn/lib/python3.11/site-packages/ruamel/yaml/main.py:47: error: Incompatible types in assignment (expression has type "None", variable has type "type[CParser]")  [assignment]
/tmp/tmps3q1g0pn/lib/python3.11/site-packages/ruamel/yaml/main.py:47: error: Incompatible types in assignment (expression has type "None", variable has type "type[CEmitter]")  [assignment]
/tmp/tmps3q1g0pn/lib/python3.11/site-packages/ruamel/yaml/main.py:223: error: Unexpected keyword argument "loader"  [call-arg]
stubs/ruamel.yaml/_ruamel_yaml.pyi:22: note: Called function defined here
/tmp/tmps3q1g0pn/lib/python3.11/site-packages/ruamel/yaml/main.py:223: error: Unexpected keyword argument "loader"  [call-arg]
stubs/ruamel.yaml/_ruamel_yaml.pyi:22: note: Called function defined here

For the latter one, self.Parser is type[Parser | CParser] and mypy doesn't seem to be willing to narrow the type to type[Parser].

Not sure how to fix those.

Stubsabot dry run also fails to run:

Marking ruamel.yaml as obsolete since '0.15.99'
    asyncio.run(main())
  File "/opt/hostedtoolcache/Python/3.12.5/x64/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.5/x64/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.5/x64/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/runner/work/typeshed/typeshed/scripts/stubsabot.py", line 835, in main
    await suggest_typeshed_obsolete(update, session, action_level=args.action_level)
  File "/home/runner/work/typeshed/typeshed/scripts/stubsabot.py", line 735, in suggest_typeshed_obsolete
    with open(obsolete.stub_path / "METADATA.toml", "rb") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'stubs/ruamel.yaml/METADATA.toml'

Help needed!

This comment has been minimized.

@stevenlele
Copy link
Contributor Author

stevenlele commented Sep 15, 2024

mitmproxy/tools/console/keymap.py:239: error: Item "StreamMark" of "Mark | StreamMark | None" has no attribute "get_snippet"  [union-attr]
mitmproxy/tools/console/keymap.py:239: error: Item "None" of "Mark | StreamMark | None" has no attribute "get_snippet"  [union-attr]
mitmproxy/tools/console/keymap.py:242: error: Item "None" of "Mark | StreamMark | None" has no attribute "line"  [union-attr]

Remarks on Mark | StreamMark | None:

PyYAML is using the Any trick in nodes to relax the explicit None check:

# Any Unions: Avoid forcing the user to check for None when they know what Node was instantiated with
# Using generics may be overkill without support for default Generics
# Permissive Unions could also be useful here.
class Node:
tag: str
value: Any
start_mark: Mark | Any
end_mark: Mark | Any
def __init__(self, tag: str, value, start_mark: Mark | None, end_mark: Mark | None) -> None: ...

Sadly, the same trick can't be fully effective here. As you see, ruamel.yaml added a FileMark type for IO streams, which has no get_snippet method, so replacing None with Any won't help with the error on line 239 anyway. The line attribute check can be relaxed though. Let me know what you think.

class StreamMark:
name: str | None
index: int
line: int
column: int
def __init__(self, name: str | None, index: int, line: int, column: int) -> None: ...
def __eq__(self, other: Self, /) -> bool: ... # type: ignore[override]
def __ne__(self, other: Self, /) -> bool: ... # type: ignore[override]
class FileMark(StreamMark): ...
class StringMark(StreamMark):
buffer: str
pointer: int
def __init__(self, name: str | None, index: int, line: int, column: int, buffer: str, pointer: int) -> None: ...
def get_snippet(self, indent: int = 4, max_length: int = 75) -> str: ...

This comment has been minimized.

Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

mitmproxy (https://github.com/mitmproxy/mitmproxy)
+ mitmproxy/optmanager.py:482: error: Need type annotation for "s"  [var-annotated]
+ mitmproxy/tools/console/keymap.py:239: error: Item "StreamMark" of "Mark | StreamMark | None" has no attribute "get_snippet"  [union-attr]
+ mitmproxy/tools/console/keymap.py:239: error: Item "None" of "Mark | StreamMark | None" has no attribute "get_snippet"  [union-attr]
+ mitmproxy/tools/console/keymap.py:242: error: Item "None" of "Mark | StreamMark | None" has no attribute "line"  [union-attr]

paasta (https://github.com/yelp/paasta)
+ paasta_tools/cli/cmds/validate.py:339: error: "Type[BaseConstructor]" has no attribute "flatten_mapping"  [attr-defined]

Copy link
Collaborator

@srittau srittau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! While I didn't check the types in depth, this looks very good to me. A few remarks below.

Also: We now require a comment explaining the use of Anys (or an appropriate type alias). This can either list the allowed types, how the value is used, or any other reason why Any is required.

Comment on lines +14 to +16
version_info: tuple[int, int, int]
__version__: str
__with_libyaml__: bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The should most likely be final:

Suggested change
version_info: tuple[int, int, int]
__version__: str
__with_libyaml__: bool
version_info: Final[tuple[int, int, int]]
__version__: Final[str]
__with_libyaml__: Final[bool]

(Needs import from typing.)

stubs/ruamel.yaml/ruamel/yaml/main.pyi Show resolved Hide resolved
Comment on lines +12 to +14
# One of codecs.{utf_16_le_decode, utf_16_be_decode, utf_8_decode}
class _BufferDecoder(Protocol):
def __call__(data: ReadableBuffer, errors: str | None = None, final: bool = False, /) -> tuple[str, int]: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# One of codecs.{utf_16_le_decode, utf_16_be_decode, utf_8_decode}
class _BufferDecoder(Protocol):
def __call__(data: ReadableBuffer, errors: str | None = None, final: bool = False, /) -> tuple[str, int]: ...
# One of codecs.{utf_16_le_decode, utf_16_be_decode, utf_8_decode}
@type_check_only
class _BufferDecoder(Protocol):
def __call__(data: ReadableBuffer, errors: str | None = None, final: bool = False, /) -> tuple[str, int]: ...

(Needs import from typing.)

Comment on lines +36 to +37
class _RepresenterFunction(Protocol[_Representer, _T_contra]):
def __call__(self, dumper: _Representer, data: _T_contra, /) -> Node: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class _RepresenterFunction(Protocol[_Representer, _T_contra]):
def __call__(self, dumper: _Representer, data: _T_contra, /) -> Node: ...
@type_check_only
class _RepresenterFunction(Protocol[_Representer, _T_contra]):
def __call__(self, dumper: _Representer, data: _T_contra, /) -> Node: ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants