Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Symmetric" liberal itypes: allow data flow inside or outside function to use either side of itype? #743

Open
mattmccutchen-cci opened this issue Nov 17, 2021 · 0 comments

Comments

@mattmccutchen-cci
Copy link
Member

mattmccutchen-cci commented Nov 17, 2021

Background

A design goal of 3C is to convert as much of the input program as possible to use checked types even if some parts of the program perform unsafe operations. To a first approximation, the Checked C compiler disallows implicit conversions either from wild to checked or from checked to wild. (The details are somewhat more complicated and can be found in the Checked C specification. One basic point worth noting is that "wild -> checked" is considered a "bounds checking" error while "checked -> wild" is considered a "type checking" error.) So when there is data flow between two values x and y in the program, to avoid a compile error, 3C normally constrains them to have the same checkedness via a bidirectional constraint x == y, realized as a pair of constraints x >= y and y >= x. If x currently must remain wild due to an unsafe operation but we want to promote the design goal by making y checked anyway, then 3C needs to insert some language construct into the data flow path that allows the change in checkedness. Currently, we're aware of (and 3C uses) two such constructs: itypes and casts (either C-style casts to wild or _Assume_bounds_casts to checked; see CastPlacementVisitor). In places in the program where 3C knows it can insert a "checkedness change" construct, 3C relaxes the normal bidirectional checkedness constraint in a corresponding way, and this relaxation limits the propagation of wildness through the constraint solution for the whole program. Then, during rewriting, 3C inserts the "checkedness change" constructs where they are needed based on the constraint solution.

That's the general way to think about checkedness changes. But AFAIK, the only place 3C currently performs them is on function parameters and return values. As we know, each function parameter and return value (call it x) is represented by an FVComponentVariable. According to 3C's "liberal itypes" feature, the FVComponentVariable contains two PVConstraints: an "internal" PVConstraint used for accesses to the parameter or return statements in the function body and an "external" PVConstraint used at call sites. I'll denote these PVConstraints x.internal and x.external. In the output program, x is given an itype x.internal : itype(x.external). For this itype to be valid, x.internal must actually be at least as wild as x.external, so 3C generates a checkedness constraint x.internal >= x.external. (Remember "wild > checked" in our lattice. I'm using the term "at least as wild" rather than the possibly more natural "at least as checked" so we don't have to reverse the sides when writing constraints.) As a special case, if x.internal == x.external in the solution, 3C generates a single type rather than an itype. When a value z in the function body flows to or from x, 3C generates the usual bidirectional checkedness constraint, but using x.internal: z == x.internal. However, when a value y at a call site flows to or from x, 3C usually generates a unidirectional checkedness constraint y >= x.external under the assumption that if the solution has y > x.external (i.e., y == wild and x.external == checked), CastPlacementVisitor can insert a cast in the appropriate direction between the types: an _Assume_bounds_cast from y to x.external on a parameter, or a C-style cast from x.external to y on a return value. It looks to me like CastPlacementVisitor is actually more powerful in two ways: it can insert a cast in either direction regardless of whether it is called on a parameter or a return value, and it knows that a cast is not needed if the call site matches either x.internal or x.external. However, constraint generation does not take advantage of this extra power.

The design space

The design of the current "checkedness change" functionality may have been motivated at one point by considerations about the order in which it makes sense to port a program. But from the point of view of the general framework I articulated above, several of the design choices now seem arbitrary and potentially limiting:

  1. During constraint generation, why does the inside of the function use only the left side of the itype, while the outside uses only the right side? Instead of calling the PVConstraints in the FVComponentVariable "internal" and "external", we could just call them "left" and "right" with x.left >= x.right. Then, when data flows between an FVComponentVariable x and another value y, we could generate checkedness constraints that reflect the fact that y can use either side of the itype without a cast. To a first approximation, these constraints would look like x.left >= y >= x.right, though there may be cases (e.g., multi-level pointers) where 3C has to generate more complex constraints to accurately model the rules of the compiler. There is some precedent for generating constraints of that form in the handling of typedefs in function parameter and return types, although the reason for generating such constraints in that setting may be different:

    unifyIfTypedef(ParamTy, *AstContext, PVExternal, Wild_to_Safe);
    unifyIfTypedef(ParamTy, *AstContext, PVInternal, Safe_to_Wild);

  2. Currently, 3C allows cast insertion (and performs the corresponding constraint relaxation) only on the outside of the function (i.e., at call sites) and not the inside (parameter accesses and return statements inside the function), and only in the direction that allows the function declaration to be less wild than the call site and not the reverse. In principle, 3C could also allow cast insertion in the reverse direction; for example, 3C could allow an argument to be made checked even if the parameter remains wild by automatically inserting a cast. Or 3C could also perform cast insertion on the inside of a function; for example, a local variable initialized from a parameter could be made checked even if the parameter remains wild by inserting a cast. We may decide that none of the other options are sensible for porting, but I think it's helpful to understand the whole design space for cast insertion.

This issue tracks any changes we decide to make along the lines of (1) or (2). (1) could in principle be applied either inside or outside the function, or both, as explored in more detail in the next section.

Some specific proposals

Mike made a previous proposal that, if I understand it correctly under an appropriate substitution of terminology, amounts to applying (1) on the outside while retaining the current cast insertion functionality. Specifically: for an outside value y for which cast insertion is blocked, we generate constraints x.left >= y >= x.right (instead of the current y == x.right), while if cast insertion is possible, we generate only y >= x.right (as now) and insert a cast if the solution has x.right <= x.left < y. For an inside value z, we generate z == x.left, as now. Compared to the current design, this would reduce the negative effect of a call site at which cast insertion is blocked.

Another design, which I'll call "symmetric liberal itypes", is to apply (1) on both the inside and the outside and never perform cast insertion. Since there is no cast insertion, an outside value y will always have the constraints x.left >= y >= x.right, so if y is wild, it will always force x.left to be wild. Under the current design where an inside value z gets the constraint z == x.left, this would force z to be wild, which we consider unacceptable. But when we apply (1) on the inside, the constraint is x.left >= z >= x.right, so z is no longer forced to be wild.

Effects and benefits of symmetric liberal itypes

Compared to the current design, symmetric liberal itypes would have 3C generate an itype when a function has either an unsafe call site or an unsafe implementation. Those are very different use cases for itypes, but I believe our current consensus is that both are legitimate:

  • The original motivation for Checked C to support itypes was to make the checked system headers backward compatible with unsafe call sites.
  • 3C's existing liberal itypes feature uses itypes for unsafe implementations. We successfully convinced the Checked C team to support this scenario on the grounds that the system library is implemented in plain C, and in theory, Checked C should allow that plain-C implementation to be compiled along with the checked headers. Normally, users would never do that, but for inline functions in the system library, it might actually happen depending on the Checked C compiler's declaration merging behavior.

I think some of us argued in the past that we like that under the current liberal itypes design, the declaration of a function tells us whether the implementation is safe, and it seems inappropriate to change the declaration based on call sites. However, that has to be weighed against the significant benefits of not having to insert casts at many call sites, which add clutter that the user will want to remove later. (3C currently has no support for removing casts it previously inserted that are no longer needed; that could be filed as another issue if it doesn't become moot first.) If we want, we may be able to find another way to give the user information about whether a function implementation is safe now that that can't be determined unambiguously from the declaration, or we may conclude that the existing root-cause analysis is good enough.

Aside from the most obvious benefit of reducing churn and clutter during porting, I can think of a few other benefits of symmetric liberal itypes:

In view of these benefits, I propose that we try to use symmetric liberal itypes in most cases. We may find there are special cases in which it isn't feasible (at least without additional work), and we may decide to make other changes in those cases, possibly including changes to cast insertion like those contemplated in (2) above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant