Date-dependent P2P test names become stale over time (6 openlibrary instances)

## Summary

6 `internetarchive/openlibrary` instances have P2P test names that contain year-dependent parameters generated by `@pytest.mark.parametrize`. These test names were recorded when the dataset was created in 2025, but shift when the evaluation runs in 2026 or later, causing false P2P failures.

## Affected instances

| Instance | F2P | P2P (stale) | P2P (fixed) |
|---|---|---|---|
| `openlibrary-1351c59f` | 1/1 | 16/18  | 18/18 |
| `openlibrary-fdbc0d8f` | 1/1 | 37/39  | 39/39  |
| `openlibrary-b112069e` | 3/3 | 62/64  | 64/64 |
| `openlibrary-1894cb48` | 1/1 | 62/64  | 64/64  |
| `openlibrary-43f9e7e0` | 1/1 | 60/62  | 62/62  |
| `openlibrary-08ac40d0` | 6/7 | 119/121  | 121/121 (2 fewer false failures) |

## Root cause

The test `test_future_publication_dates_are_deleted` in `openlibrary/catalog/add_book/tests/test_add_book.py` generates parametrized test IDs using the current year:

```python
@pytest.mark.parametrize("date,expected", [
    ("2000-11-11", True),
    (str(current_year), True),        # dynamic
    (str(current_year + 1), False),   # dynamic
    ("9999-01-01", False),
])
def test_future_publication_dates_are_deleted(date, expected):
    ...
```

**When dataset was created (2025):**
- `test_future_publication_dates_are_deleted[2025-True]`
- `test_future_publication_dates_are_deleted[2026-False]`

**When eval runs (2026):**
- `test_future_publication_dates_are_deleted[2026-True]`
- `test_future_publication_dates_are_deleted[2027-False]`

The harness does exact string matching on test names, so it can't find `[2025-True]` → marks as NOT RUN → FAIL.

## Verification

We confirmed this by running the **golden patch** against the dataset:

| | Stale dataset (2025 names) | Updated dataset (2026 names) |
|---|---|---|
| **Golden patch** |  FAIL (0/5) |  PASS (5/5) |

The golden patch itself fails when the dataset has stale year-dependent test names. All tests actually pass — the failure is purely a name mismatch.

## Suggested fix

Update the `pass_to_pass` field in the dataset for the 6 affected instances:
- `[2025-True]` → `[2026-True]`
- `[2026-False]` → `[2027-False]`

Or consider excluding year-parameterized tests from P2P lists, since they will go stale every year.

## Impact

5 instances are currently false negatives — the golden patch produces correct results but is marked as failed due to stale test names in the dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Date-dependent P2P test names become stale over time (6 openlibrary instances) #85

Summary

Affected instances

Root cause

Verification

Suggested fix

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Instance	F2P	P2P (stale)	P2P (fixed)
`openlibrary-1351c59f`	1/1	16/18	18/18
`openlibrary-fdbc0d8f`	1/1	37/39	39/39
`openlibrary-b112069e`	3/3	62/64	64/64
`openlibrary-1894cb48`	1/1	62/64	64/64
`openlibrary-43f9e7e0`	1/1	60/62	62/62
`openlibrary-08ac40d0`	6/7	119/121	121/121 (2 fewer false failures)

Date-dependent P2P test names become stale over time (6 openlibrary instances) #85

Description

Summary

Affected instances

Root cause

Verification

Suggested fix

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions