Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a mechanism to specify "open world" files #738

Open
john-h-kastner opened this issue Oct 22, 2021 · 1 comment
Open

Add a mechanism to specify "open world" files #738

john-h-kastner opened this issue Oct 22, 2021 · 1 comment

Comments

@john-h-kastner
Copy link
Collaborator

Allow the user of 3C to specify files and directories as "open world" instead of default "closed world". For closed world files, 3C would continue using its current assumptions. It has a complete program with all dependencies and clients available. Open world files drop this assumption. Dependencies might not be available (manifesting as undefined functions), and there might be more clients than are in the available source code (inferred types must accommodate unchecked callers without casting).

Much of the unique behavior enabled for open world files would be the same as can already be controlled with the -itypes-for-extern and infer-types-for-undefs flags. It might be possible to redefine them in terms of applying open or closed world behavior to the entire project in order to avoid duplicating similar logic. Alternatively, they could be deprecated in favor of whatever new mechanism is added to specify open world files.

Closed world

We assume 3C has access to all callers of all functions, and all uses of all structures, global variables, and typedefs. This matches 3C's current assumptions, so behavior should not change in this mode.

  • A function can be rewritten to a checked type if it is internally checked, regardless of its callers, because we have access to all callers and can automatically insert casts. Functions with an unsafe definition are rewritten using itypes. Undefined functions are treated as unchecked because we can assume that no definitions exists. Functions must be defined (for successful linking), so something is clearly wrong, and the functions should be given unchecked types instead of fully checked or itypes. This will manifest as a warning in the root cause analysis if the function is called.
  • Structure fields and global variables can be checked if all uses of the field are safe. Any unsafe use forces a pointer to use an unchecked type (or an itype if we implement that).
  • Typedefs follow similar rules to structures and globals. They can be checked if all uses are checked. Checked C does not support any notion of itypes on typedefs, but 3C does have limited support for isolating unsafe uses.

Open World

We cannot assume that 3C can see all function caller, function definitions, etc., so the analysis must be adjusted to permit arbitrary types in the missing code. This will act like some combination of the current -itypes-for-extern and -infer-types-for-undefs flags.

  • A function may never be rewritten to a fully checked type because there may be unsafe calls that we are not aware of and on which we cannot insert casts. An internally safe function must be rewritten using an itype instead (as is the case for internally unsafe functions). This matches the behavior with -itypes-for-extern. Undefined functions are handled as they are with the current -infer-types-for-undefs flag. Since the function definition is not visible, we must conservatively treat the definition as unchecked. Undefined functions will then be internally unsafe, and will be rewritten to use itypes. An open question here is how 3C should go about forcing a function parameter to be an itype. Currently, -itypes-for-extern, does this by moving checked types into itypes only during rewriting, but this is known to cause invalid rewriting in some cases. Instead, the internal constraint variable for the parameter might be constrained to WILD. This avoids potential Checked C types errors, but limits conversion of local variables inside the function. Similar questions exist for structure fields, global variables, and typedefs.
  • Structure fields and global variables cannot be fully checked for the same reason. The current version of 3C cannot infer itypes here, so if they were constrained to WILD, they would solve to fully unchecked types. Improvements to 3C should add support for itypes on structure fields and global variables with unsafe uses. They would then rewrite to itypes, matching the current behavior of -itypes-for-extern.
  • Typedefs again follow similar rules, but with adjustments due to the lack of itypes. All typedefs must be unchecked to accommodate any potential unchecked declaration using the typedef. Function parameter declarations using the unchecked typedef use the current itype workaround where the typedef is used as the unchecked portion of a typedef with the typedef expanded in the checked portion. Again, this matches the current behavior of -itypes-for-extern. The workaround seems unsatisfying. If every typedef is unchecked, then every local variable using the typedef would have to remain unchecked as well. Other ideas have been discussed for duplicating typedefs into checked and unchecked versions.

Example Use: converting a library header

In the libjpeg tutorial, the jpeglib.h header file was copied into a local include directory. 3C was then re-run with -infer-types-for-undefs to enable solving for and inserting checked types into the local copy of the header file even though the functions in the header were not defined.

After the changes proposed here, the flag passed to 3C would then be -open-world=./include to specify that files in ./include use the open world assumptions. All other files use the (default) closed world assumptions. When 3C is re-run, the open world assumption allows the undefined functions in jpeglib.h to solve to itypes as before. The changes are:

  1. Any undefined functions outside of ./include are still unchecked.
  2. Structure fields and global variables rewrite to itypes instead of fully checked types. This should allow the converted version of the header to more easily be used to compile the unconverted libjpeg source code. Previously these itypes needed to be added manually for this to be possible.
  3. Typedefs are completely unchecked. This is a problem and will substantially limit conversion in to_ppm.c due to local variables using the typedef remaining unchecked.

Example Use: Converting a single file in a project

The current approach is to enable -itypes-for-extern, convert with 3C, and then keep only the converted header files and the single source file you want to convert. Instead, the whole project could be specified as open world with -open-world=.. This would make all functions solve to itypes, all typedefs be unchecked, and all structure fields and global variables use itypes instead of checked types. The changes from current behavior are:

  1. Any undefined functions will also solve to itypes. Since -itypes-for-extern is only expected to be enabled in phase two of porting, it might be reasonable to assume that there are no undefined functions outside of any previously specified open world files, since these could be warnings or errors as discussed earlier.
@mattmccutchen-cci
Copy link
Member

Good write-up. I have only a few things to add at this time:

A function can be rewritten to a checked type if it is internally checked, regardless of its callers, because we have access to all callers and can automatically insert casts.

Not always: insertion of a cast at a call site might be blocked by a macro (or, in theory, unwritable code, although that would be a very unusual project setup). I believe 3C handles this correctly now.

Soundness: As I think we discussed briefly at Friday's meeting, in an open world, any guarantee of spatial memory safety depends on an assumption that external code interacts with shared declarations in a way consistent with the declarations' Checked C annotations (as well as an assumption that the external code is internally spatially safe, which there's nothing we can do about). This applies to all kinds of program elements: not just undefined functions, but also structs and global variables that could be generated in project code and passed to external code or vice versa.

The common case is that the project uses a plain-C external library and needs Checked C declarations for the library that are consistent with its documented behavior, as currently described in the 3c -help for -infer-types-for-undefs. There can still be data flow in both directions (function parameters and returns, callbacks, etc.), but the point of view is that the library has already defined the API in some form and the project needs to describe that API in Checked C. However, theoretically, the reverse scenario could also happen: the project could be a library implemented in Checked C that supports plain-C clients (which would require some way of providing plain-C headers, such as erasable syntax). Then the project needs to ensure that the Checked C declarations used to check the safety of its implementation are consistent with the project's own API documentation that it provides to plain-C clients of the library. (I suppose this might be true "by construction" if the API documentation shows the declarations with Checked C annotations as a way of communicating assumptions to the reader, even if the reader is not using Checked C.)

In all of these cases, to give the user a practical way to ensure the project is spatially safe, we need a workflow to ensure that all shared declarations get reviewed. This generalizes what is currently described in #698 for undefined functions. It might be appropriate to broaden #698 to cover review of all kinds of shared declarations and keep this issue for discussion of the remaining topics.

Porting phase 2: If we run 3C on the whole project (as is our current practice), we don't actually have unknown callers and definitions; we just want to generate output that is compatible with all callers and definitions as they currently appear in the input, so that the user can copy part of the output into their main branch without introducing errors in the rest of the code. In principle, if the user knows in advance which parts of the code they are going to update, a more direct way to achieve the same end goal would be to mark the rest of the code as unwritable, assuming 3C has an option to do that (I will file an issue for that soon). The use of -itypes-for-extern in this scenario is really just a trick (or a hack, depending on your point of view) to do a single 3C run and then let the user look at the output and decide which part they want to copy (based on how much work it will be to fix the errors in that part), rather than having to do a separate 3C run for each file with all the other files marked unwritable (which would take quadratic time) before making the decision.

Assuming the user wants to copy file A (and maybe some header files?), conceivably there might be edge cases in which running 3C with all other files marked unwritable gives slightly better output for A than just running 3C with -itypes-for-extern. If that happens, it might be useful to for the user run 3C once with -itypes-for-extern to get a rough assessment of which file to work on next and then run 3C again with the other files marked unwritable to get the real output to copy.

Since running 3C on the whole project as part of phase 2 is a fundamentally different use scenario than a truly open world, we can expect that it would have slightly different needs that would ideally be accommodated by a separate option, although we can decide whether they are important enough to justify the work of actually maintaining a separate option. A few things I can think of so far:

  1. As you mentioned, undefined functions or global variables in writable code should ideally still generate an error or warning.
  2. If 3C does see that the definition and all callers of a program element X are sufficiently checked, it can go ahead and change the type of X to fully checked. Currently, we leave all of those changes to the very end of porting, when we run 3C with -itypes-for-extern turned off, but it could be nice to let them happen sooner.

On the other hand, if we run 3C on only one file at a time (to reduce the running time?), then that file is truly open-world from the point of view of that 3C run, except that we don't have to worry about soundness of declarations shared only with the rest of the project because we ultimately will be checking the entire project against those declarations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants