Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove hard-coded checked declarations from 3C regression tests. #576

Merged
merged 11 commits into from
May 21, 2021

Conversation

mattmccutchen-cci
Copy link
Member

Fixes #560. As usual, I've broken the change into a sequence of commits to ease review.

@kyleheadley
Copy link
Member

This PR changes most of our regression tests in a significant way. To facilitate review of this issue, could we please extend the description with:

  • Motivation statement: "Commonly included system headers now conflict with our test's code" is sufficient if correct.
  • Solution statement: The title is good but is possibly insufficient. What other main issues came up and how were they solved? If the explanation is in the commits, point us to the relevant ones.
  • Completion statement: point us to the actions run that succeeded on all the tests.
  • A list of the most common deleted functions now available in the system headers. So we can evaluate them once and skip them when we see them.
  • Some way to separate generated tests from individual ones. I expect this is already in the commits, but please make it explicit: which commit changed generated tests, and which files have the relevant changes? Is there a magic GitHub link that shows us everything but the generated tests? Are all the commits either independent of each other (and worth reading through individually) or deal with generated tests?
  • What work is expected to be done in review?, ie, "All hard-coded functions matched system prototypes, just check for typos" or "Please read the purpose statements of all the tests to see if the system function was meant as a system function" or "please skim, but pay special attention to these to commits: #A #B"

Thanks for your attention and submission, I know there have been some recent PRs that we wish had been reviewed more thoroughly.

@mattmccutchen-cci
Copy link
Member Author

mattmccutchen-cci commented May 13, 2021

This PR changes most of our regression tests in a significant way. To facilitate review of this issue, could we please extend the description with:

Maybe it will be easier to follow if I reply to your questions individually rather than have you look for the answers in a revised description?

  • Motivation statement: "Commonly included system headers now conflict with our test's code" is sufficient if correct.

"There is a risk that in the future, the hard-coded declarations of system elements (functions, types, etc.) in our tests' code conflict with the Checked C system headers or cause other problems because they are incorrect in a particular environment."

  • Solution statement: The title is good but is possibly insufficient. What other main issues came up and how were they solved? If the explanation is in the commits, point us to the relevant ones.

Indeed, the main issues are described in the commit messages, but perhaps not as clearly as you are looking for. I'll describe them again here.

  • Most of the changes consist of deleting our declarations of system elements and adding the #include of the system header that declares the elements (if the #include was not already present). Now that implicit inclusion of checked headers is turned on, this #include will generally give us the checked declaration if there is one. I initially did this once by hand, but I was concerned that I may have made mistakes, so I re-did it with scripts, mainly in this commit; I can post more information about the scripts if you're curious. The results of the two approaches were essentially the same, so I'm pretty confident that the declarations have been replaced with the correct #includes: any mistake that affected only one of the two approaches would have been caught.

    The more interesting problem is that some of the declarations in our tests had minor differences from the standard declarations, so in theory, replacing them with #includes of the standard declarations could change the behavior of the tests. I think this is unlikely in our tests, except as noted in the next item. But in case we want to scrutinize this further, my scripts generated an intermediate commit that shows just the differences between the old declarations and the standard declarations. Two cases were addressed in this commit instead due to limitations of my scripts.

  • In some cases, it was clear that defining our own element was important for the test. In those cases, I renamed the element to prevent any future conflict.

  • Some of our tests use %clang_cc1 instead of %clang for reasons that are still unclear to us (see Clean up test RUN commands #346). %clang_cc1 turns off the use of some of the system headers, so I ran into problems when I tried to replace the hard-coded declarations in those tests with #includes; I don't remember the details now. I overcame this problem by replacing %clang_cc1 with %clang in the two affected tests. In theory, there is a risk that other behavior differences between %clang_cc1 and %clang affect the behavior of those two tests, but I think it is small.

  • Completion statement: point us to the actions run that succeeded on all the tests.

The tests pass, and if I re-run my script that searches for declarations of system elements in our tests, I see that all of the remaining matches are false positives (because the script uses crude text matching heuristics). But the more important thing is our level of confidence that the original matches were replaced correctly by the script.

Update 2021-05-18: I originally interpreted your comment as "the actions that you took", i.e., "the checks that you performed", but based on our discussion in #590 (comment), I assume you mean an actual run of the GitHub workflow. I've started a workflow run.

  • A list of the most common deleted functions now available in the system headers. So we can evaluate them once and skip them when we see them.

I did generate this as part of my process, so I'll post it, although it's unclear to me exactly how you plan to use it in review.

name header
calloc stdlib.h
free stdlib.h
gmtime time.h
malloc stdlib.h
memcmp string.h
memcpy string.h
memset string.h
printf stdio.h
realloc stdlib.h
size_t stddef.h
strcmp string.h
strcpy string.h
strlen string.h
strstr string.h
wchar_t stddef.h
  • Some way to separate generated tests from individual ones. I expect this is already in the commits, but please make it explicit: which commit changed generated tests, and which files have the relevant changes? Is there a magic GitHub link that shows us everything but the generated tests? Are all the commits either independent of each other (and worth reading through individually) or deal with generated tests?

In the revised PR as of 2021-05-18, the generated tests are now handled separately in the first commit. (I've deleted the explanation of the previous approach; it's available in the edit history of this comment.)

  • What work is expected to be done in review?, ie, "All hard-coded functions matched system prototypes, just check for typos" or "Please read the purpose statements of all the tests to see if the system function was meant as a system function" or "please skim, but pay special attention to these to commits: #A #B"

The high-level goal is to establish whatever level of confidence we want that (1) none of the commits make unintended and harmful behavior changes to the tests and (2) this PR has completely solved the original problem, i.e., no declarations of system elements remain in the tests. I'm proposing that we can trust my scripts with respect to (2) and the two commits marked "Automated and should not affect the behavior of the tests"; part of the work would be to form your own opinion of the validity of this methodology. Beyond that, the appropriate review would depend on the nature of each commit. For example, you could verify that the renaming of intentionally nonstandard declarations was done correctly and the deviations of the remaining declarations from the standard look unimportant.

Of course, without thorough knowledge of how every test interacts with 3C, it's hard to be completely certain of (1), so we have to make a judgment call of what level of confidence is appropriate before accepting this PR and moving on. I suspect that I've already gone well beyond the level that Mike would want in the context of our other priorities, but this is another issue on which you would form your own opinion.

Thanks for your attention and submission, I know there have been some recent PRs that we wish had been reviewed more thoroughly.

You're right that I'm asking for review to be consistently taken seriously, even if it is after the fact because of a deadline. A separate issue I see is that for several of my PRs, you have asked me to write out a lot of explanation where other reviewers might have studied the PR and figured out the intent by themselves, even if the level of review ultimately achieved is the same in both cases. But I'm willing to write the explanation as long as we believe it's a reasonable use of my work time.

declarations, and run it to update the generated tests.

In contrast to my approach in the previous revision of the PR, we have
to manually verify that there's no behavior change in the change to
testgenerator, but that shouldn't be too hard, and Kyle and Mike
suggested that they would find this approach easier to understand.
Hope this doesn't affect the validity of the tests.
Automated and should not affect the behavior of the tests.
This change was automated, but we need to review it to reassure
ourselves that it doesn't affect the validity of the tests.
Automated and should not affect the behavior of the tests.
- Remove `#include <stddef.h>` from files that no longer directly
  reference `size_t`.

- Order of `#include`s, etc.
headers.

%clang_cc1 expands to a command line containing -nostdsysteminc, which
turns off the use of the system headers.

Hope this doesn't affect the validity of the tests. We'll probably end
up doing this to all the 3C regression tests as part of #346, but for
now, minimize the number of tests we have to think about.
@mattmccutchen-cci
Copy link
Member Author

I've updated this PR after #590 and updated my previous comment about the PR. Kyle, I think it is ready for you to review again, modulo your desire to see the results of the workflow run (as opposed to just running the regression tests locally).

@kyleheadley
Copy link
Member

The main update of extern functions in tests looks great.

There were some high-level decisions made with exceptional tests that are probably fine but do potentially change test behavior. I'm unqualified to understand the subtleties myself and would like either a brief explanation of why it should be OK, or for someone else to look at it.

from commit 5ee5a81 "Rename references that weren't intended to be to standard declarations."
gmtime and free and ulong: why are these versions unsuitable for std headers? Why not adjust them as done below?

from commit 5a862a9 "Standardize checked declarations."
printf becomes _Unchecked, strlen loses _Unchecked (and restrict): Why would this not effect tests? Why not redefine a local version as done above?
@john-h-kastner or @mwhicks1, could you weigh in on this specifically? The commit is long but it's just printf/strlen repeated, prepping them for their sysheader version. Would this change the tests too much?

also from 5ee5a81 above
(k_and_r.c):FILE changed to int: What's going on here? Where was this defined? why change it?

from aa138e8 " Replace %clang_cc1 with %clang in two tests that need the system headers."
A good explanation is in the full commit message. Maybe we should just leave these tests alone? They shouldn't be affected by system changes.

@mattmccutchen-cci
Copy link
Member Author

mattmccutchen-cci commented May 19, 2021

The main update of extern functions in tests looks great.

Good!

There were some high-level decisions made with exceptional tests that are probably fine but do potentially change test behavior. I'm unqualified to understand the subtleties myself and would like either a brief explanation of why it should be OK, or for someone else to look at it.

I went ahead and tried to answer your specific questions below. But indeed, at some point, it may be more efficient for someone else to take over the review of the PR if you aren't confident in approving it.

from commit 5ee5a81 "Rename references that weren't intended to be to standard declarations."
gmtime and free and ulong: why are these versions unsuitable for std headers? Why not adjust them as done below?

  • gmtime: It looks like the test is specifically testing what happens when gmtime is declared twice, once without an itype and once with it. If we used the system header, we wouldn't have as much control of the sequence of declarations to ensure that it is exercising the 3C behavior that we want to test. This was my judgment call from looking at the test, but if someone else knows more about this particular test than I do, then we can take a different approach.
  • free: The declaration in the test is missing the byte_count(0) compared to the standard declaration. I assumed this might be important to the test, but now I realize that it probably isn't. If someone else can confirm this, I can change that test to use the standard free. Update 2021-05-21: Now I see that the test has another nonstandard declaration of free at the bottom that is clearly intentional and that I should keep. I imagine that I noticed this when I initially prepared the PR but overlooked it when previously writing this comment.
  • ulong: definedType.c is specifically testing 3C's behavior when we use a macro that expands to a type name. The ulong in sys/types.h is a typedef (at least on my system).

from commit 5a862a9 "Standardize checked declarations."
printf becomes _Unchecked, strlen loses _Unchecked (and restrict): Why would this not effect tests? Why not redefine a local version as done above?

John or Mike may answer these specific cases anyway, but I think it's worth me trying again to state the general idea behind this decision (which I failed to articulate completely earlier). There are two potential problems with the declarations of system elements used in a test:

  1. If the purpose of a test was to test 3C's handling of typical usage of the system element, but the test uses a hard-coded declaration that differs from the standard one, then the test might not be a valid test of the behavior that real users of 3C will see.
  2. If the purpose of a test was to test a specific 3C behavior that we found was triggered by a specific declaration of a system element, then changing to the standard declaration might no longer exercise the target code path in 3C in order to catch problems with it.

For any given test, we may not know for sure whether (1) or (2) is a bigger risk. (Both may be low, but we should still make the best decision we can.) Using local versions as you suggest would avoid (2) but expose us to (1). For the _Unchecked and restrict cases you mention, I think it's likely that the inconsistencies arose from copying and pasting of declarations from different revisions of the Checked C headers and have no significance to the tests, so risk (2) is minimal and we should use the standard declarations to avoid (1). In fact, I'm not aware of any way in which 3C's behavior is affected by _Unchecked or restrict annotations in these contexts at all.

also from 5ee5a81 above
(k_and_r.c):FILE changed to int: What's going on here? Where was this defined? why change it?

Since YY_USE_PROTOS is not defined in this test, the code I changed is never reached. I think this code was originally copied from one of our benchmarks and both branches originally used FILE, but the #else branch was manually edited to use int to avoid the need to include stdio.h because the actual type of the parameter was not important to the test. I changed the #if branch for consistency and to make clear to future readers that we didn't intend to use the real FILE. This change is tangential to the goal of removing hard-coded checked declarations, but I felt it was appropriate to make while I was here.

from aa138e8 " Replace %clang_cc1 with %clang in two tests that need the system headers."
A good explanation is in the full commit message. Maybe we should just leave these tests alone? They shouldn't be affected by system changes.

As I mentioned in the commit message, we eventually want to make a decision about this %clang_cc1 thing as part of #346 anyway. I was hoping we could make at least an interim decision about %clang_cc1 for this PR rather than have to leave these two files containing hard-coded declarations of system elements, unlike all the others. You claim that system changes are a low risk for these tests; I could equally well claim that removing %clang_cc1 is a low risk. Without further evidence supporting either claim, that isn't really a basis on which to make the decision.

@kyleheadley
Copy link
Member

Great. Since you took the time to understand the purpose of the tests with gmtime and ulong, please leave a brief comment about them in the code (unless I missed an existing one). For %clang_cc1 my preference is to be conservative here and await #346 rising in our priority queue for discussion. But if you think you figured it out then it's fine, but please put a comment there with your choice.

The only question left I can see is regarding _Unchecked as described above, about which I need to defer to others.

@mattmccutchen-cci
Copy link
Member Author

Great. Since you took the time to understand the purpose of the tests with gmtime and ulong, please leave a brief comment about them in the code (unless I missed an existing one).

Let me wait for John's review of the main standardization, and maybe he will have an opinion on these cases too that will lead me to implement something different.

For %clang_cc1 my preference is to be conservative here and await #346 rising in our priority queue for discussion. But if you think you figured it out then it's fine, but please put a comment there with your choice.

Since we're not in a great hurry, I propose that we just have the full discussion of the %clang_cc1 part of #346 now. I really think the outcome is just going to be that no one knows of any specific problem with switching from %clang_cc1 to %clang, so if we want to do it eventually, it may as well be now. But I'll submit a separate PR and we can discuss the topic there.

mattmccutchen-cci added a commit that referenced this pull request May 20, 2021
This requires replacing `%clang_cc1 -verify` with `%clang -Xclang
-verify` in the tests that use it.

This is one part of the RUN command cleanup that we've wanted to do for
a while in #346, and now it is blocking part of #576. This is because
the single behavior difference we know of between %clang_cc1 and %clang
(other than the need for `-Xclang`) is that %clang_cc1 turns off the
system headers, and we want two of our tests that currently use
%clang_cc1 to be able to get checked declarations from the system
headers.

As recently as dac3f97 (August 2020),
all of the tests that used %clang_cc1 used -verify. Since then, a few
tests have been added that use %clang_cc1 and not -verify. This lends
support to my theory that the only reason we used %clang_cc1 was that we
were unaware that -verify could be used with %clang via -Xclang, and
some subsequently added tests copied the use of %clang_cc1 only because
the authors didn't know any better. Thus, I believe that the risk of the
migration affecting the validity of the tests (e.g., causing them to
falsely pass) is low.
@mattmccutchen-cci
Copy link
Member Author

Since we're not in a great hurry, I propose that we just have the full discussion of the %clang_cc1 part of #346 now. I really think the outcome is just going to be that no one knows of any specific problem with switching from %clang_cc1 to %clang, so if we want to do it eventually, it may as well be now. But I'll submit a separate PR and we can discuss the topic there.

Submitted #596. We can proceed with this PR once #596 is resolved and John reviews the standardization issues in this PR.

Copy link
Collaborator

@john-h-kastner john-h-kastner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from commit 5a862a9 "Standardize checked declarations."
printf becomes _Unchecked, strlen loses _Unchecked (and restrict): Why would this not effect tests? Why not redefine a local version as done above?
@john-h-kastner or @mwhicks1, could you weigh in on this specifically? The commit is long but it's just printf/strlen repeated, prepping them for their sysheader version. Would this change the tests too much?

Adding and removing _Unchecked should be entirely inconsequential. As far as I'm aware the only place we look at _Checked and _Unchecked annotations is in CheckedRegions.cpp. Examining that file, we look for blocks that are written with an explicit _Checked, but explicit _Unchecked annotations are treated almost the same as a blocks without checked specifiers. For example, here, _Unchecked and un-annotated blocks are equivalent.

bool CheckedRegionAdder::isWrittenChecked(const clang::CompoundStmt *S) {
CheckedScopeSpecifier WCSS = S->getWrittenCheckedSpecifier();
switch (WCSS) {
case CSS_None:
return false;
case CSS_Unchecked:
return false;
case CSS_Bounds:
return true;
case CSS_Memory:
return true;
}
llvm_unreachable("Invalid Checked Scope Specifier.");
}

In markChecked they're treated a little differently so that we won't accidentally add a _Checked block around an existing _Unchecked block (this strikes me as something we could improve if we get around to working on -addcr again). This isn't relevant the referenced commit because it only changes external function prototypes where we'd never want to add a _Checked region.

void CheckedRegionFinder::markChecked(CompoundStmt *S, int Localwild) {
auto Cur = S->getWrittenCheckedSpecifier();
llvm::FoldingSetNodeID Id;
S->Profile(Id, *Context, true);
bool IsChecked = !hasUncheckedParameters(S) &&
Cur == CheckedScopeSpecifier::CSS_None && Localwild == 0;
Map[Id] = IsChecked ? IS_CHECKED : IS_UNCHECKED;
}

The final possible issue is statistics collection. Making the change means these tests will report one fewer unchecked regions. That part of the statistics collection code didn't exist when these tests were written, and none of them are evaluated based on the statistics output.


The restrict are slightly more likely to matter. Any type qualifier on a variable that might be rewritten is relevant, but these functions are never defined, so they shouldn't be rewritten, and preserving qualifiers should never come into play. Also, rewriting with restrict qualifiers is explicitly tested in qualifiers.c


In all, I think the changes are acceptable.

Copy link
Member

@kyleheadley kyleheadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

John's review addresses the only issue I had left.

mattmccutchen-cci added a commit that referenced this pull request May 21, 2021
This requires replacing `%clang_cc1 -verify` with `%clang -Xclang
-verify` in the tests that use it.

This is one part of the RUN command cleanup that we've wanted to do for
a while in #346, and now it is blocking part of #576. This is because
the single behavior difference we know of between %clang_cc1 and %clang
(other than the need for `-Xclang`) is that %clang_cc1 turns off the
system headers, and we want two of our tests that currently use
%clang_cc1 to be able to get checked declarations from the system
headers.

As recently as dac3f97 (August 2020),
all of the tests that used %clang_cc1 used -verify. Since then, a few
tests have been added that use %clang_cc1 and not -verify. This lends
support to my theory that the only reason we used %clang_cc1 was that we
were unaware that -verify could be used with %clang via -Xclang, and
some subsequently added tests copied the use of %clang_cc1 only because
the authors didn't know any better. Thus, I believe that the risk of the
migration affecting the validity of the tests (e.g., causing them to
falsely pass) is low.
…nto regtests-checked-decls

Merging #596 into this PR subsumes the migration of two tests from
%clang_cc1 to %clang that was previously done in
aa138e8.
@mattmccutchen-cci
Copy link
Member Author

mattmccutchen-cci commented May 21, 2021

Thanks John and Kyle. I've added comments about the elements I renamed as Kyle previously requested, so I think this PR is ready to go now.

Note: When I squash-merge the PR, GitHub defaults to setting the squash commit message to the concatenation of the individual commit messages. I think that's a bit hard to read and more distracting than useful on the whole, so I'm clearing out that part of the message. I believe that anyone who wants the details will be better served by reading this PR thread.

@mattmccutchen-cci mattmccutchen-cci merged commit 8401770 into main May 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hard-coded checked declarations in regression tests may conflict with updated checked headers
3 participants