This is an opinion on how to integrate testing into the standards lifecycle. It is well aligned with the WHATWG working mode and W3C testing how-to but has no official status.
The primary purpose of standards is to achieve interoperable implementations, and the desired outcome is full interoperability that web developers can depend on. By writing and sharing tests as soon as implementation begins and making testing an integrated part of the standards process, we can reduce the overall cost/time of evolving the web platform and achieve good interop by default.
(See "Finding a path to interop" for definitions of interoperability and full interoperability.)
In the earliest stages, only an explainer will exist, there will be no standard, no tests, and no implementation.
If the idea gets traction, at some point a first implementation will begin. This could be before or after the standard has begun to take shape, but for any non-trivial idea will be well before the standard is fully formed.
All tests for web-exposed behavior that are written for the first implementation should be shared. They will fall into one of two categories:
- The standard already covers this behavior. Write regular tests.
- The standard doesn't, or no standard exists yet. Then, write tentative tests. Make it clear why the test is tentative, for example with a link to an open spec issue.
Tentative tests should be revisited and converted to regular tests as the standard catches up. Because they were originally not reviewed against any spec text, converted tests must be reviewed as if they were new, not as a simple renaming.
By the time the first implementation ships there should ideally be no tentative tests.
When the first implementation ships, the standard and tests ought to be in good shape, so asking "What do implementations do?" and "What do the tests reveal?" for spec changes (see WHATWG working mode) becomes relevant. Around this time it would make sense to adopt a policy for testing normative spec changes.
A second implementer will almost certainly discover spec bugs, test bugs and missing test coverage. This will result in new failing tests for the first implementer, which should be very welcome. (Without shared tests, interop problems could otherwise go unnoticed until much later.)
Following implementers will find a standard and test suite that are in increasingly good shape, and the existing test suite should lower the time/cost to implement.
For a mature standard, the whole process can repeat for proposed changes. New ideas start as issues, tests are modified or added, and implementer experience helps shape the spec and tests into their final form.
As the standard, test suite and implementations co-evolve, an increasing number of tests will pass everywhere. For those tests, however trivial, full interop has been achieved. Once achieved, tooling should make it impossible to accidentally depart from full interop. Web developers can depend on the feature to work the same way in any browser.
- HTML: the
messageerror
event had spec text and tests written side-by-side and merged together, before any implementation. - Fetch: aborting spec change and tests also written together, and test bug fixed during implementation.
- Trusted Types: explainer and tentative tests written before spec text.
Note: The order of spec text, tests and implementation isn't always the same, and it's not clear that there's a single best order. Any working mode that feels productive and results in quality all around should be embraced.
Risks of tentative tests in web-platform-tests:
- If treated like regular tests, they would influence incentives. (See wpt.fyi issues #83 and #99.)
- When converting tentative tests to regular tests, it is tempting to do very light review.
- Without a triage process, they might be left for a very long time. (TODO)
Risks of engine-specific tests:
- Shared tests wouldn't be the initial default, making it more tempting to use vendor-specific testing APIs.
- Upstreaming a large number of engine-specific tests is some amount of work, and risks being delayed.
- If a second implementer becomes interested early, they might be blocked on the first implementer sharing their tests.
It's not an open-and-shut case, but sharing tests as early as possibly seems best.
- The web-platform-tests documentation on writing and reviewing tests
- The web-platform-tests dashboard (wpt.fyi)
- The web-platform-tests PR results (posts comments to GitHub PRs)
- Chromium's documentation on web-platform-tests, writing layout tests and the layout tests tips