Set policy for LLM-generated tests #202

jgraham · 2024-07-08T07:17:09Z

CC @web-platform-tests/wpt-core-team

I was recently asked about the policy for using LLMs to generate tests that are submitted to wpt. Currently we don't have any explicit policy on this, but I think we need one, so it's clear to test authors what's permissible, and to downstream consumers whether wpt is in line with any other policies they might have with regard to LLMs.

To be clear, I expect that any policy here would not affect review requirements i.e. we'd still require human review for all tests to ensure they're correct and easy to follow.

Ms2ger · 2024-07-10T09:27:19Z

Given their tendency to create output that looks plausible but isn't actually correct, I'd lean towards "don't allow" or at least increase review requirements on them. We've been relatively permissive in accepting tests based on the fact that a human has already reasoned their way through the spec while writing; if that's no longer a valid assumption, we should take that into account.

tabatkins · 2024-07-10T16:30:24Z

Yes, I'm overall on the "disallow officially" stance. If they are LLM-generated, but are good/correct enough we can't tell, then fine, but in my experience people who use LLMs like this are not particularly concerned about the quality in the first place.

jgraham · 2024-07-10T19:34:32Z

We also need to consider cases where the test is paritally generated but human-modified (e.g. with copilot-like tools).

The practical arguments I've heard in favour of allowing LLM-[partially-]generated tests are basically:

People are going to do it anyway, and it will be difficult to police.
For people who do follow the policy being forbidden from using productivity tooling for wpt will make them more likely to write non-wpt tests.

Speaking personally, I'm definitely worried about the potential for lower-quality tests being written using these technologies, but as a practical matter I don't know how much we can enforce, so I'd only want to ban them if there's some strong non-technical reason (e.g. vendors who would be unable to use wpt). However I'd be interesting in ideas for marking such tests (assuming good faith contributors) so that we could verify whether they cause more problems in the longer term compared to purely human-generated tests.

WeizhongX · 2024-08-06T16:20:07Z

Maybe we need clarify the license issue associate with code generated with LLM? One of the reason LLM is banned from some corporation is because the license of LLM-generated code can be unclear, thus not safe to use.

What is the policy regarding the license now for WPTs contributed by a human?

jgraham · 2024-08-06T16:35:38Z

I think broadly it's the submitter's responsibility to ensure that they're able to contribute it under the terms of the 3-Clause BSD License.

zcorpan · 2024-08-08T14:19:30Z

I have thought about using an LLM to generate a testing plan (giving it https://web-platform-tests.org/writing-tests/making-a-testing-plan.html + the relevant part of the spec to test), and just tested it now with ChatGPT 4o for https://quirks.spec.whatwg.org/#the-html-element-fills-the-viewport-quirk and https://quirks.spec.whatwg.org/#the-body-element-fills-the-html-element-quirk

After clarifying that I want a list of 100 tests with short descriptions, it came up with various reasonable cases to test. I then asked it to write the first test, and while it made a few mistakes (included a standards mode doctype, unnecessary margin calculations, and unnecessary setup() etc boilerplate), after fixing those the test is basically correct and useful (it passes in Chrome and Safari, fails in Firefox due to bug 1259924).

As a reviewer I would probably like to know when LLMs were used and whether the contributor put in some effort to clean up and verify the correctness.

Prompts and output

Please read this to learn how to generate a testing plan for a specification. [attachment: making-a-testing-plan.md]
Now make a testing plan for the attached spec. [attachment: quirks-html-body.bs]
5 tests is not sufficient to test all interesting cases. I expect something on the order of 100 tests. Please only generate a list where each bullet point briefly explains a single test. Example: "Test float: left on <body>, verify the border box size in the block flow direction"

Certainly! Here is an extensive list of test cases to thoroughly cover the specification for quirks mode behavior of the <html> and <body> elements:

Tests for <html> Element

Vertical Writing Mode with width: auto

Test width: auto on <html>, verify the border box size in vertical writing mode.

Test margin-left and margin-right on <html>, verify the border box size in vertical writing mode.

Test margin-left: 10px; margin-right: 20px on <html>, verify the border box size in vertical writing mode.

Test border: 5px solid on <html>, verify the border box size in vertical writing mode.

Test padding: 10px on <html>, verify the border box size in vertical writing mode.

Test width: auto; writing-mode: vertical-rl on <html>, verify the border box size.

Horizontal Writing Mode with height: auto

Test height: auto on <html>, verify the border box size in horizontal writing mode.

Test margin-top and margin-bottom on <html>, verify the border box size in horizontal writing mode.

Test margin-top: 10px; margin-bottom: 20px on <html>, verify the border box size in horizontal writing mode.

Test border: 5px solid on <html>, verify the border box size in horizontal writing mode.

Test padding: 10px on <html>, verify the border box size in horizontal writing mode.

Test height: auto; writing-mode: horizontal-tb on <html>, verify the border box size.

Tests for <body> Element

Vertical Writing Mode with width: auto

Test width: auto on <body>, verify the border box size in vertical writing mode.

Test margin-left and margin-right on <body>, verify the border box size in vertical writing mode.

Test margin-left: 10px; margin-right: 20px on <body>, verify the border box size in vertical writing mode.

Test border: 5px solid on <body>, verify the border box size in vertical writing mode.

Test padding: 10px on <body>, verify the border box size in vertical writing mode.

Test position: static on <body>, verify the border box size in vertical writing mode.

Test position: relative on <body>, verify the border box size in vertical writing mode.

Test position: absolute on <body>, verify the border box size in vertical writing mode.

Test position: fixed on <body>, verify the border box size in vertical writing mode.

Test float: none on <body>, verify the border box size in vertical writing mode.

Test float: left on <body>, verify the border box size in vertical writing mode.

Test float: right on <body>, verify the border box size in vertical writing mode.

Test inline-level <body>, verify the border box size in vertical writing mode.

Test spanning <body> (multi-column), verify the border box size in vertical writing mode.

Horizontal Writing Mode with height: auto

Test height: auto on <body>, verify the border box size in horizontal writing mode.

Test margin-top and margin-bottom on <body>, verify the border box size in horizontal writing mode.

Test margin-top: 10px; margin-bottom: 20px on <body>, verify the border box size in horizontal writing mode.

Test border: 5px solid on <body>, verify the border box size in horizontal writing mode.

Test padding: 10px on <body>, verify the border box size in horizontal writing mode.

Test position: static on <body>, verify the border box size in horizontal writing mode.

Test position: relative on <body>, verify the border box size in horizontal writing mode.

Test position: absolute on <body>, verify the border box size in horizontal writing mode.

Test position: fixed on <body>, verify the border box size in horizontal writing mode.

Test float: none on <body>, verify the border box size in horizontal writing mode.

Test float: left on <body>, verify the border box size in horizontal writing mode.

Test float: right on <body>, verify the border box size in horizontal writing mode.

Test inline-level <body>, verify the border box size in horizontal writing mode.

Test spanning <body> (multi-column), verify the border box size in horizontal writing mode.

Interaction Between <html> and <body>

Different Writing Modes

Test <html> horizontal and <body> vertical, verify the border box size for both.

Test <html> vertical and <body> horizontal, verify the border box size for both.

Same Writing Modes with Different Properties

Test <html> and <body> both vertical, different margins.

Test <html> and <body> both vertical, different paddings.

Test <html> and <body> both vertical, different borders.

Test <html> and <body> both horizontal, different margins.

Test <html> and <body> both horizontal, different paddings.

Test <html> and <body> both horizontal, different borders.

Additional Scenarios for Edge Cases

Complex Combinations

Test <html> with width: auto, <body> with height: auto.

Test <html> with height: auto, <body> with width: auto.

Test <html> with padding: 20px, <body> with margin: 30px.

Test <html> with border: 5px solid, <body> with padding: 15px.

Test <html> with margin: 10px, <body> with border: 10px solid.

Nested Elements and Layouts

Test nested <div> elements inside <body> affecting the border box size.

Test nested <table> elements inside <body> affecting the border box size.

Test nested <iframe> elements inside <body> affecting the border box size.

Test nested <section> elements inside <body> affecting the border box size.

Test nested <article> elements inside <body> affecting the border box size.

Special CSS Combinations

Test <html> with display: flex, <body> with flex-direction: column.

Test <html> with display: grid, <body> with grid-template-rows.

Test <html> with display: block, <body> with display: inline.

Test <html> with display: inline-block, <body> with display: block.

Margin and Padding Variations

Test <body> with varying margin-top: 10px to 50px.

Test <body> with varying margin-bottom: 10px to 50px.

Test <body> with varying margin-left: 10px to 50px.

Test <body> with varying margin-right: 10px to 50px.

Test <body> with varying padding-top: 10px to 50px.

Test <body> with varying padding-bottom: 10px to 50px.

Test <body> with varying padding-left: 10px to 50px.

Test <body> with varying padding-right: 10px to 50px.

Tests for Interaction with Other Elements

Interaction with <header> and <footer>

Test <body> with a <header> element affecting the border box size.

Test <body> with a <footer> element affecting the border box size.

Interaction with Forms and Inputs

Test <body> with a <form> element affecting the border box size.

Test <body> with multiple <input> elements affecting the border box size.

Interaction with Multimedia Elements

Test <body> with an <img> element affecting the border box size.

Test <body> with a <video> element affecting the border box size.

Test <body> with an <audio> element affecting the border box size.

Special Cases and Edge Conditions

Handling of Overflow

Test <body> with overflow: auto, verify the border box size.

Test <body> with overflow: hidden, verify the border box size.

Test <body> with overflow: scroll, verify the border box size.

Extreme Values

Test <body> with extremely large margin values.

Test <body> with extremely small padding values.

JavaScript Modifications

Test modifying <body> styles via JavaScript, verify the border box size.

Test modifying <html> styles via JavaScript, verify the border box size.

Responsive and Adaptive Design

Test <body> with media queries affecting width and height.

Test <body> with viewport changes, verify border box size adjustments.

Testing in Different Browsers

Test <body> behavior in Chrome, verify quirks mode behavior.

Test <body> behavior in Firefox, verify quirks mode behavior.

Test <body> behavior in Safari, verify quirks mode behavior.

Test <body> behavior in Edge, verify quirks mode behavior.

Integration with Modern CSS Features

Test <body> with CSS Variables affecting size properties.

Test <body> with CSS Custom Properties.

Test <body> with CSS Grid and Flexbox properties.

Legacy and Compatibility Modes

Test <body> with <!DOCTYPE html> and no doctype, verify differences.

Test <body> with <!DOCTYPE html SYSTEM "about:legacy-compat">.

Additional Custom Scenarios

Test custom styles on <body> with user-defined classes.

Test combining multiple CSS properties on <body> simultaneously.

Test <body> with pseudo-classes like :hover and :active.

Test <body> with pseudo-elements like ::before and ::after.

Test <body> with different character encodings affecting layout.

By covering these tests, you will comprehensively ensure the correct behavior of the <html> and <body> elements in quirks mode under various conditions and configurations.

Create the first test using testharness.js

<!DOCTYPE html>
<html>
<head>
    <title>Quirks Mode Test: HTML Width Auto in Vertical Writing Mode</title>
    <script src="/resources/testharness.js"></script>
    <script src="/resources/testharnessreport.js"></script>
    <style>
        html {
            width: auto;
            writing-mode: vertical-rl;
            margin: 0;
            padding: 0;
            border: 0;
        }
        body {
            margin: 0;
            padding: 0;
            border: 0;
        }
    </style>
</head>
<body>
    <script>
        // Define the test
        test(function() {
            // Retrieve the HTML element
            var htmlElement = document.documentElement;

            // Get the computed style of the HTML element
            var computedStyle = getComputedStyle(htmlElement);

            // Calculate the expected border box size
            var viewportWidth = window.innerWidth;
            var marginLeft = parseInt(computedStyle.marginLeft);
            var marginRight = parseInt(computedStyle.marginRight);
            var expectedBorderBoxWidth = Math.max(viewportWidth - marginLeft - marginRight, htmlElement.clientWidth);

            // Assert that the border box width matches the expected value
            assert_equals(htmlElement.offsetWidth, expectedBorderBoxWidth, "HTML element's border box width should be correctly calculated.");
        }, "Test HTML element with width: auto in vertical writing mode");

        // Report the results
        setup({explicit_done: true});
        add_completion_callback(function() {
            done();
        });
    </script>
</body>
</html>

Please make these changes:
- remove the doctype
- remove marginLeft and marginRight calculations (can assume those are 0)
- remove "// Report the results" and the rest of the script after that

<html>
<head>
    <title>Quirks Mode Test: HTML Width Auto in Vertical Writing Mode</title>
    <script src="/resources/testharness.js"></script>
    <script src="/resources/testharnessreport.js"></script>
    <style>
        html {
            width: auto;
            writing-mode: vertical-rl;
            margin: 0;
            padding: 0;
            border: 0;
        }
        body {
            margin: 0;
            padding: 0;
            border: 0;
        }
    </style>
</head>
<body>
    <script>
        // Define the test
        test(function() {
            // Retrieve the HTML element
            var htmlElement = document.documentElement;

            // Get the computed style of the HTML element
            var computedStyle = getComputedStyle(htmlElement);

            // Calculate the expected border box size
            var viewportWidth = window.innerWidth;
            var expectedBorderBoxWidth = Math.max(viewportWidth, htmlElement.clientWidth);

            // Assert that the border box width matches the expected value
            assert_equals(htmlElement.offsetWidth, expectedBorderBoxWidth, "HTML element's border box width should be correctly calculated.");
        }, "Test HTML element with width: auto in vertical writing mode");
    </script>
</body>
</html>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set policy for LLM-generated tests #202

Set policy for LLM-generated tests #202

jgraham commented Jul 8, 2024

Ms2ger commented Jul 10, 2024

tabatkins commented Jul 10, 2024

jgraham commented Jul 10, 2024 •

edited

Loading

WeizhongX commented Aug 6, 2024

jgraham commented Aug 6, 2024

zcorpan commented Aug 8, 2024

Tests for `<html>` Element

Vertical Writing Mode with `width: auto`

Horizontal Writing Mode with `height: auto`

Tests for `<body>` Element

Vertical Writing Mode with `width: auto`

Horizontal Writing Mode with `height: auto`

Interaction Between `<html>` and `<body>`

Different Writing Modes

Same Writing Modes with Different Properties

Additional Scenarios for Edge Cases

Complex Combinations

Nested Elements and Layouts

Special CSS Combinations

Margin and Padding Variations

Tests for Interaction with Other Elements

Interaction with `<header>` and `<footer>`

Interaction with Forms and Inputs

Interaction with Multimedia Elements

Special Cases and Edge Conditions

Handling of Overflow

Extreme Values

JavaScript Modifications

Responsive and Adaptive Design

Testing in Different Browsers

Integration with Modern CSS Features

Legacy and Compatibility Modes

Additional Custom Scenarios

Set policy for LLM-generated tests #202

Set policy for LLM-generated tests #202

Comments

jgraham commented Jul 8, 2024

Ms2ger commented Jul 10, 2024

tabatkins commented Jul 10, 2024

jgraham commented Jul 10, 2024 • edited Loading

WeizhongX commented Aug 6, 2024

jgraham commented Aug 6, 2024

zcorpan commented Aug 8, 2024

Tests for <html> Element

Vertical Writing Mode with width: auto

Horizontal Writing Mode with height: auto

Tests for <body> Element

Vertical Writing Mode with width: auto

Horizontal Writing Mode with height: auto

Interaction Between <html> and <body>

Different Writing Modes

Same Writing Modes with Different Properties

Additional Scenarios for Edge Cases

Complex Combinations

Nested Elements and Layouts

Special CSS Combinations

Margin and Padding Variations

Tests for Interaction with Other Elements

Interaction with <header> and <footer>

Interaction with Forms and Inputs

Interaction with Multimedia Elements

Special Cases and Edge Conditions

Handling of Overflow

Extreme Values

JavaScript Modifications

Responsive and Adaptive Design

Testing in Different Browsers

Integration with Modern CSS Features

Legacy and Compatibility Modes

Additional Custom Scenarios

jgraham commented Jul 10, 2024 •

edited

Loading

Tests for `<html>` Element

Vertical Writing Mode with `width: auto`

Horizontal Writing Mode with `height: auto`

Tests for `<body>` Element

Vertical Writing Mode with `width: auto`

Horizontal Writing Mode with `height: auto`

Interaction Between `<html>` and `<body>`

Interaction with `<header>` and `<footer>`