Skip to content

Commit 6764439

Browse files
committed
main: add explicit handling of overlapping collection arguments
Consider a pytest invocation like `pytest tests/ tests/test_it.py`. What should happen? Currently what happens is that only `tests/test_it.py` is run, which is obviously wrong. This regressed in the big package collection rework (PR #11646). The reason it regressed is the way pytest collection works. See #12083 for (some) details. I have made an attempt to fix the problem directly in the collection loop, but failed. The main challenge is the node caching, i.e. when should a collector node be reused when it is needed for several collection arguments. I believe it is possible to make it work, but it's hard. In order to not leave this embarrassing bug lingering for any longer, this instead takes an easier approach, which is to massage the collection argument list itself such that issues with overlapping nodes don't come up during collection at all. This *adds* complexity instead of simplifying things, but I hope it should be good enough in practice for now, and maybe we can revisit in the future. This change introduces behavioral changes, mainly: - `pytest a/b a/` is equivalent to `pytest a`; if there is an `a/a` then `a/b` will *not* be ordered before `a/a`. So the ability to order a subset before a superset is lost. - `pytest x.py x.py` does *not* run the file twice; previously we took an explicit request like this to mean that it should. The `--keep-duplicates` option remains as a sort of "expert mode" that retains its current behavior; though it is still subtly broken in that *collector nodes* are also duplicated (not just the items). A fix for that requires the harder change. Fix #12083.
1 parent 1efe0db commit 6764439

File tree

7 files changed

+783
-40
lines changed

7 files changed

+783
-40
lines changed

changelog/12083.breaking.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Fixed a bug where an invocation such as `pytest a/ a/b` would cause only tests from `a/b` to run, and not other tests under `a/`.
2+
3+
The fix entails a few breaking changes to how such overlapping arguments and duplicates are handled:
4+
5+
1. `pytest a/b a/` or `pytest a/ a/b` are equivalent to `pytest a`; if an argument overlaps another arguments, only the prefix remains.
6+
7+
2. `pytest x.py x.py` is equivalent to `pytest x.py`; previously such an invocation was taken as an explicit request to run the tests from the file twice.
8+
9+
If you rely on these behaviors, consider using :ref:`--keep-duplicates <duplicate-paths>`, which retains its existing behavior (including the bug).

doc/en/example/pythoncollection.rst

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ You can run all of the tests within ``tests/`` *except* for ``tests/foobar/test_
5555
by invoking ``pytest`` with ``--deselect tests/foobar/test_foobar_01.py::test_a``.
5656
``pytest`` allows multiple ``--deselect`` options.
5757

58+
.. _duplicate-paths:
59+
5860
Keeping duplicate paths specified from command line
5961
----------------------------------------------------
6062

@@ -82,18 +84,6 @@ Example:
8284
collected 2 items
8385
...
8486
85-
As the collector just works on directories, if you specify twice a single test file, ``pytest`` will
86-
still collect it twice, no matter if the ``--keep-duplicates`` is not specified.
87-
Example:
88-
89-
.. code-block:: pytest
90-
91-
pytest test_a.py test_a.py
92-
93-
...
94-
collected 2 items
95-
...
96-
9787
9888
Changing directory recursion
9989
-----------------------------------------------------

src/_pytest/main.py

Lines changed: 70 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -781,14 +781,25 @@ def perform_collect(
781781
try:
782782
initialpaths: list[Path] = []
783783
initialpaths_with_parents: list[Path] = []
784-
for arg in args:
785-
collection_argument = resolve_collection_argument(
784+
785+
collection_args = [
786+
resolve_collection_argument(
786787
self.config.invocation_params.dir,
787788
arg,
789+
i,
788790
as_pypath=self.config.option.pyargs,
789791
consider_namespace_packages=consider_namespace_packages,
790792
)
791-
self._initial_parts.append(collection_argument)
793+
for i, arg in enumerate(args)
794+
]
795+
796+
if not self.config.getoption("keepduplicates"):
797+
# Normalize the collection arguments -- remove duplicates and overlaps.
798+
self._initial_parts = normalize_collection_arguments(collection_args)
799+
else:
800+
self._initial_parts = collection_args
801+
802+
for collection_argument in self._initial_parts:
792803
initialpaths.append(collection_argument.path)
793804
initialpaths_with_parents.append(collection_argument.path)
794805
initialpaths_with_parents.extend(collection_argument.path.parents)
@@ -976,12 +987,9 @@ def genitems(self, node: nodes.Item | nodes.Collector) -> Iterator[nodes.Item]:
976987
yield node
977988
else:
978989
assert isinstance(node, nodes.Collector)
979-
keepduplicates = self.config.getoption("keepduplicates")
980990
# For backward compat, dedup only applies to files.
981-
handle_dupes = not (keepduplicates and isinstance(node, nodes.File))
991+
handle_dupes = not isinstance(node, nodes.File)
982992
rep, duplicate = self._collect_one_node(node, handle_dupes)
983-
if duplicate and not keepduplicates:
984-
return
985993
if rep.passed:
986994
for subnode in rep.result:
987995
yield from self.genitems(subnode)
@@ -1033,11 +1041,13 @@ class CollectionArgument:
10331041
parts: Sequence[str]
10341042
parametrization: str | None
10351043
module_name: str | None
1044+
original_index: int
10361045

10371046

10381047
def resolve_collection_argument(
10391048
invocation_path: Path,
10401049
arg: str,
1050+
arg_index: int,
10411051
*,
10421052
as_pypath: bool = False,
10431053
consider_namespace_packages: bool = False,
@@ -1109,4 +1119,57 @@ def resolve_collection_argument(
11091119
parts=parts,
11101120
parametrization=parametrization,
11111121
module_name=module_name,
1122+
original_index=arg_index,
1123+
)
1124+
1125+
1126+
def is_collection_argument_subsumed_by(
1127+
arg: CollectionArgument, by: CollectionArgument
1128+
) -> bool:
1129+
"""Check if `arg` is subsumed (contained) by `by`."""
1130+
# First check path subsumption.
1131+
if by.path != arg.path:
1132+
# `by` subsumes `arg` if `by` is a parent directory of `arg` and has no
1133+
# parts (collects everything in that directory).
1134+
if not by.parts:
1135+
return arg.path.is_relative_to(by.path)
1136+
return False
1137+
# Paths are equal, check parts.
1138+
# For example: ("TestClass",) is a prefix of ("TestClass", "test_method").
1139+
if len(by.parts) > len(arg.parts) or arg.parts[: len(by.parts)] != by.parts:
1140+
return False
1141+
# Paths and parts are equal, check parametrization.
1142+
# A `by` without parametrization (None) matches everything, e.g.
1143+
# `pytest x.py::test_it` matches `x.py::test_it[0]`. Otherwise must be
1144+
# exactly equal.
1145+
if by.parametrization is not None and by.parametrization != arg.parametrization:
1146+
return False
1147+
return True
1148+
1149+
1150+
def normalize_collection_arguments(
1151+
collection_args: Sequence[CollectionArgument],
1152+
) -> list[CollectionArgument]:
1153+
"""Normalize collection arguments to eliminate overlapping paths and parts.
1154+
1155+
Detects when collection arguments overlap in either paths or parts and only
1156+
keeps the shorter prefix, or the earliest argument if duplicate, preserving
1157+
order. The result is prefix-free.
1158+
"""
1159+
# A quadratic algorithm is not acceptable since large inputs are possible.
1160+
# So this uses an O(n*log(n)) algorithm which takes advantage of the
1161+
# property that after sorting, a collection argument will immediately
1162+
# precede collection arguments it subsumes. An O(n) algorithm is not worth
1163+
# it.
1164+
collection_args_sorted = sorted(
1165+
collection_args,
1166+
key=lambda arg: (arg.path, arg.parts, arg.parametrization or ""),
11121167
)
1168+
normalized: list[CollectionArgument] = []
1169+
last_kept = None
1170+
for arg in collection_args_sorted:
1171+
if last_kept is None or not is_collection_argument_subsumed_by(arg, last_kept):
1172+
normalized.append(arg)
1173+
last_kept = arg
1174+
normalized.sort(key=lambda arg: arg.original_index)
1175+
return normalized

0 commit comments

Comments
 (0)