Skip to content

Date-dependent P2P test names become stale over time (6 openlibrary instances) #85

@sspriyadharshini31

Description

@sspriyadharshini31

Summary

6 internetarchive/openlibrary instances have P2P test names that contain year-dependent parameters generated by @pytest.mark.parametrize. These test names were recorded when the dataset was created in 2025, but shift when the evaluation runs in 2026 or later, causing false P2P failures.

Affected instances

Instance F2P P2P (stale) P2P (fixed)
openlibrary-1351c59f 1/1 16/18 18/18
openlibrary-fdbc0d8f 1/1 37/39 39/39
openlibrary-b112069e 3/3 62/64 64/64
openlibrary-1894cb48 1/1 62/64 64/64
openlibrary-43f9e7e0 1/1 60/62 62/62
openlibrary-08ac40d0 6/7 119/121 121/121 (2 fewer false failures)

Root cause

The test test_future_publication_dates_are_deleted in openlibrary/catalog/add_book/tests/test_add_book.py generates parametrized test IDs using the current year:

@pytest.mark.parametrize("date,expected", [
    ("2000-11-11", True),
    (str(current_year), True),        # dynamic
    (str(current_year + 1), False),   # dynamic
    ("9999-01-01", False),
])
def test_future_publication_dates_are_deleted(date, expected):
    ...

When dataset was created (2025):

  • test_future_publication_dates_are_deleted[2025-True]
  • test_future_publication_dates_are_deleted[2026-False]

When eval runs (2026):

  • test_future_publication_dates_are_deleted[2026-True]
  • test_future_publication_dates_are_deleted[2027-False]

The harness does exact string matching on test names, so it can't find [2025-True] → marks as NOT RUN → FAIL.

Verification

We confirmed this by running the golden patch against the dataset:

Stale dataset (2025 names) Updated dataset (2026 names)
Golden patch FAIL (0/5) PASS (5/5)

The golden patch itself fails when the dataset has stale year-dependent test names. All tests actually pass — the failure is purely a name mismatch.

Suggested fix

Update the pass_to_pass field in the dataset for the 6 affected instances:

  • [2025-True][2026-True]
  • [2026-False][2027-False]

Or consider excluding year-parameterized tests from P2P lists, since they will go stale every year.

Impact

5 instances are currently false negatives — the golden patch produces correct results but is marked as failed due to stale test names in the dataset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions