-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
main: add explicit handling of overlapping collection arguments #13704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
src/_pytest/main.py
Outdated
keeps the shorter prefix, or the earliest argument if duplicate, preserving | ||
order. The result is prefix-free. | ||
""" | ||
# TODO: Fix quadratic runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be fixed before merging, but wanted to post the simple understandable version before complicating with a trie or whatever.
This is a breaking change, but I think we're about due for a pytest 9, so seems OK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this @bluetech!
Everything looks good to me, specially the comprehensive tests.
The quadratic time is important, as you mention, just wanted to highlight a use case common for us at work:
When we get a CI failure, we output a GH actions summary showing a command line that can be copy/pasted to reproduce the failed tests directly.
For example:
pytest test1.py::test_foo test1.py::TestBar::test_bar ...
This list can be quite long, because it contains all the node ids that failed in the job, so it is important that this use case does not cause the collection time to balloon out of control.
9554f1d
to
3fef8d6
Compare
I tried a bit to get it non-quadratic, got something but it's quite a bit more complex and only gets faster at very high counts due to the higher fixed costs. Before this I optimized the simple quadratic solution a bit (removed unneeded Benchmark
#!/usr/bin/env python
"""Benchmark for normalize_collection_arguments function to verify sub-quadratic performance."""
from __future__ import annotations
from pathlib import Path
import random
import time
from _pytest.main import CollectionArgument
from _pytest.main import normalize_collection_arguments
def generate_test_data(n: int) -> list[CollectionArgument]:
"""Generate test data with various overlapping patterns."""
args = []
# Create a mix of different argument types
for i in range(n):
choice = random.randint(0, 4)
if choice == 0:
# Simple file paths
path = Path(f"/test/path/file_{i % 100}.py")
parts = ()
elif choice == 1:
# Paths with class specifications
path = Path(f"/test/path/file_{i % 50}.py")
parts = (f"TestClass{i % 10}",)
elif choice == 2:
# Paths with method specifications
path = Path(f"/test/path/file_{i % 50}.py")
parts = (f"TestClass{i % 10}", f"test_method_{i % 5}")
elif choice == 3:
# Directory paths
path = Path(f"/test/path/dir_{i % 20}")
parts = ()
else:
# Nested directory paths (creates parent-child relationships)
depth = random.randint(1, 5)
path = Path("/test/path")
for d in range(depth):
path = path / f"subdir_{i % 10}_{d}"
path = path / f"file_{i}.py"
parts = ()
args.append(CollectionArgument(path=path, parts=parts, module_name=None))
# Add some duplicates and overlapping arguments
for i in range(n // 10):
if args:
# Add exact duplicates
args.append(args[random.randint(0, len(args) - 1)])
# Add parent directory that subsumes children
if random.random() < 0.3:
child = args[random.randint(0, len(args) - 1)]
if child.path.parent != child.path:
args.append(
CollectionArgument(
path=child.path.parent, parts=(), module_name=None
)
)
random.shuffle(args)
return args[:n]
def benchmark_function(sizes: list[int], num_runs: int = 3) -> dict:
"""Run the benchmark for different input sizes."""
random.seed(0)
results = {}
for n in sizes:
print(f"Benchmarking n={n}...", end=" ")
times = []
for run in range(num_runs):
# Generate test data
test_data = generate_test_data(n)
# Measure runtime
start = time.perf_counter()
result = normalize_collection_arguments(test_data)
end = time.perf_counter()
times.append(end - start)
avg_time = sum(times) / len(times)
results[n] = avg_time
print(f"avg time: {avg_time:.6f}s")
return results
def plot_ascii(results: dict, width: int = 70, height: int = 20):
"""Create an ASCII plot of the results."""
if not results:
return
sizes = sorted(results.keys())
times = [results[s] for s in sizes]
# Normalize times for plotting
max_time = max(times)
min_time = min(times) if min(times) > 0 else 0.000001
max_size = max(sizes)
min_size = min(sizes)
print("\n" + "=" * width)
print("Runtime vs Input Size (n)")
print("=" * width)
# Create the plot grid
grid = [[" " for _ in range(width)] for _ in range(height)]
# Add Y-axis labels
for i in range(height):
y_val = max_time - (i * max_time / (height - 1))
label = f"{y_val:.4f}s"
for j, char in enumerate(label):
if j < width:
grid[i][j] = char
# Plot the data points
label_offset = 10 # Space for Y-axis labels
plot_width = width - label_offset - 2
for size, time in results.items():
if max_size > min_size:
x = label_offset + int(
(size - min_size) / (max_size - min_size) * plot_width
)
else:
x = label_offset + plot_width // 2
if max_time > 0:
y = int((1 - time / max_time) * (height - 1))
else:
y = height // 2
if 0 <= x < width and 0 <= y < height:
grid[y][x] = "*"
# Add axes
for i in range(height):
if label_offset < width:
grid[i][label_offset - 1] = "|"
for j in range(label_offset, width):
grid[height - 1][j] = "-"
# Print the grid
for row in grid:
print("".join(row))
# Add X-axis labels
print(" " * label_offset + "└" + "─" * (width - label_offset - 1))
# Print size labels
x_labels = []
for i, size in enumerate(sizes):
if i % max(1, len(sizes) // 5) == 0 or i == len(sizes) - 1:
x_labels.append((size, sizes.index(size)))
label_line = " " * (label_offset + 1)
for size, idx in x_labels:
if max_size > min_size:
x_pos = int((size - min_size) / (max_size - min_size) * plot_width)
else:
x_pos = plot_width // 2
label = str(size)
if x_pos + len(label) < plot_width:
padding = x_pos - len(label_line) + label_offset + 1
if padding > 0:
label_line += " " * padding + label
print(label_line)
print(" " * (width // 2 - 10) + "Input size (n)")
# Analyze complexity
print("\n" + "=" * width)
print("Complexity Analysis:")
print("=" * width)
if len(sizes) >= 3:
# Calculate growth rate between different size pairs
ratios = []
for i in range(1, len(sizes)):
if results[sizes[i - 1]] > 0:
size_ratio = sizes[i] / sizes[i - 1]
time_ratio = results[sizes[i]] / results[sizes[i - 1]]
ratios.append((size_ratio, time_ratio))
# Estimate complexity
avg_size_ratio = sum(r[0] for r in ratios) / len(ratios)
avg_time_ratio = sum(r[1] for r in ratios) / len(ratios)
# For O(n), time ratio ≈ size ratio
# For O(n log n), time ratio ≈ size ratio * log(size ratio)
# For O(n²), time ratio ≈ size ratio²
import math
expected_linear = avg_size_ratio
expected_nlogn = avg_size_ratio * (1 + math.log2(avg_size_ratio))
expected_quadratic = avg_size_ratio * avg_size_ratio
# Find closest match
diff_linear = abs(avg_time_ratio - expected_linear)
diff_nlogn = abs(avg_time_ratio - expected_nlogn)
diff_quadratic = abs(avg_time_ratio - expected_quadratic)
min_diff = min(diff_linear, diff_nlogn, diff_quadratic)
if min_diff == diff_linear:
complexity = "O(n) - Linear"
elif min_diff == diff_nlogn:
complexity = "O(n log n) - Linearithmic"
else:
complexity = "O(n²) - Quadratic"
print(f"Estimated complexity: {complexity}")
print(f"Average size increase: {avg_size_ratio:.2f}x")
print(f"Average time increase: {avg_time_ratio:.2f}x")
print(f"Expected for O(n): {expected_linear:.2f}x")
print(f"Expected for O(n log n): {expected_nlogn:.2f}x")
print(f"Expected for O(n²): {expected_quadratic:.2f}x")
if avg_time_ratio >= expected_quadratic * 0.8:
print("\nWARNING: Appears to be quadratic or worse")
else:
print("\nSUCCESS: Seems sub-quadratic")
def main():
"""Run the benchmark."""
print("Benchmarking normalize_collection_arguments")
print("-" * 50)
sizes = [25, 50, 100, 200, 400, 800, 1_600, 3_200, 6_400, 12_800, 25_600, 51_200, 102_400]
print(f"Testing with sizes: {sizes}")
print("Each size will be run 3 times and averaged")
print("-" * 50)
results = benchmark_function(sizes, num_runs=3)
print("\nResults Summary:")
print("-" * 50)
for size, time in sorted(results.items()):
print(f"n={size:6d}: {time:.6f}s")
plot_ascii(results)
if __name__ == "__main__":
main() These are my results (AMD HX 370, Python 3.13, Linux):
Not so good -- about 4.5s for 100k args, and getting worse with more -- but maybe OK? Possibly other parts of the pipeline break at this scale? We do have the --keep-duplicates workaround which avoids the entire normalization, if the given input is well formed.
Do you have an estimate on how many arguments it can get to max? |
Oh not that many, usually less than 100. I was just mentioning because I'm not sure how common that use case is. |
3fef8d6
to
67fca28
Compare
I realize the benchmark is no good - generates way too many overlaps. A benchmark on the common case of no duplicates and overlaps shows the quadratic behavior and it becomes slow a lot sooner. So needs another approach. |
This is for the benefit of the next commit. That commit wants to check whether a CollectionArgument is subsumed by another. According to pytest semantics: `test_it.py::TestIt::test_it[a]` subsumed by `test_it.py::TestIt::test_it` However the `parts` are ["TestIt", test_it[a]"] ["TestIt", test_it"] which means a simple list prefix cannot be used. By splitting the parametrization `"[a]"` part to its own attribute, it can be handled cleanly. I also think this is a reasonable change regardless. We'd probably want something like this when the "collection structure contains parametrization" TODO is tackled.
Consider a pytest invocation like `pytest tests/ tests/test_it.py`. What should happen? Currently what happens is that only `tests/test_it.py` is run, which is obviously wrong. This regressed in the big package collection rework (PR pytest-dev#11646). The reason it regressed is the way pytest collection works. See pytest-dev#12083 for (some) details. I have made an attempt to fix the problem directly in the collection loop, but failed. The main challenge is the node caching, i.e. when should a collector node be reused when it is needed for several collection arguments. I believe it is possible to make it work, but it's hard. In order to not leave this embarrassing bug lingering for any longer, this instead takes an easier approach, which is to massage the collection argument list itself such that issues with overlapping nodes don't come up during collection at all. This *adds* complexity instead of simplifying things, but I hope it should be good enough in practice for now, and maybe we can revisit in the future. This change introduces behavioral changes, mainly: - `pytest a/b a/` is equivalent to `pytest a`; if there is an `a/a` then `a/b` will *not* be ordered before `a/a`. So the ability to order a subset before a superset is lost. - `pytest x.py x.py` does *not* run the file twice; previously we took an explicit request like this to mean that it should. The `--keep-duplicates` option remains as a sort of "expert mode" that retains its current behavior; though it is still subtly broken in that *collector nodes* are also duplicated (not just the items). A fix for that requires the harder change. Fix pytest-dev#12083.
67fca28
to
6764439
Compare
I updated the PR:
Unless there is something I missed, I think it's ready. |
Consider a pytest invocation like
pytest tests/ tests/test_it.py
. What should happen?Currently what happens is that only
tests/test_it.py
is run, which is obviously wrong. This regressed in the big package collection rework (PR #11646).The reason it regressed is the way pytest collection works. See #12083 for (some) details.
I have made an attempt to fix the problem directly in the collection loop, but failed. The main challenge is the node caching, i.e. when should a collector node be reused when it is needed for several collection arguments. I believe it is possible to make it work, but it's hard.
In order to not leave this embarrassing bug lingering for any longer, this instead takes an easier approach, which is to massage the collection argument list itself such that issues with overlapping nodes don't come up during collection at all. This adds complexity instead of simplifying things, but I hope it should be good enough in practice for now, and maybe we can revisit in the future.
This change introduces behavioral changes, mainly:
pytest a/b a/
is equivalent topytest a
; if there is ana/a
thena/b
will not be ordered beforea/a
. So the ability to order a subset before a superset is lost.pytest x.py x.py
does not run the file twice; previously we took an explicit request like this to mean that it should.The
--keep-duplicates
option remains as a sort of "expert mode" that retains its current behavior; though it is still subtly broken in that collector nodes are also duplicated (not just the items). A fix for that requires the harder change.Fix #12083.