Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snprintf bounds-safe interface is unhelpful for variable-length null-terminated buffers #450

Closed
mattmccutchen-cci opened this issue May 26, 2021 · 11 comments

Comments

@mattmccutchen-cci
Copy link
Contributor

The bounds-safe interface of snprintf (from #309) has a conditional:

// Since snprintf automatically adds the null terminator
// and counts that number in n, s only needs count(n-1) per the
// definition of _Nt types. Additional declaration for arrays
// available in checkedc_extensions.h
_Unchecked
int snprintf(char * restrict s : itype(restrict _Nt_array_ptr<char>) count(n == 0 ? 0 : n-1),
size_t n,
const char * restrict format : itype(restrict _Nt_array_ptr<const char>), ...);

The compiler seems to evaluate the count expression just fine when n is a constant, which perhaps is the most common use scenario:

char local_buf _Nt_checked[200];
snprintf(local_buf, sizeof local_buf, "Hello world");  // OK

But if the count expression is given by a variable (which did occur in one of our ports), the compiler isn't smart enough to verify the bounds and raises a warning:

void snprintf_test(_Nt_array_ptr<char> buf : count(len), size_t len) {
  snprintf(buf, len + 1, "Hello world");
}
snprintf_test.c:13:12: warning: cannot prove argument meets declared bounds for 1st parameter [-Wcheck-bounds-decls-unchecked-scope]
  snprintf(buf, len + 1, "Hello world");
           ^~~
snprintf_test.c:13:12: note: (expanded) expected argument bounds are 'bounds((char *)buf, (char *)buf + len + 1 == 0 ? 0 : len + 1 - 1)'
snprintf_test.c:13:12: note: (expanded) inferred bounds are 'bounds(buf, buf + len)'
  snprintf(buf, len + 1, "Hello world");
           ^~~

Since the diagnostic is only a warning, the user could choose to manually review the safety of the call and then ignore the warning. But I assume we're trying to work toward a future in which false positive diagnostics are rare enough that they can be made errors.

One workaround is to use snprintf_array_ptr and sacrifice one element of the buffer:

  snprintf_array_ptr(buf, len, "Hello world");  // OK

Currently snprintf_array_ptr is undefined (#449), but that could be fixed. Losing one element probably isn't a big deal in practice, but it still feels a bit ugly.

I wondered if removing the conditional might solve the problem:

_Unchecked
int snprintf1(char * restrict s : itype(restrict _Nt_array_ptr<char>) count(n-1),
             size_t n,
             const char * restrict format : itype(restrict _Nt_array_ptr<const char>), ...);

Then if n = 0, we get count(-1), which is a little weird but might not cause any problems. Unfortunately, the compiler doesn't even seem to be able to reason that len + 1 - 1 = len:

snprintf_test.c:16:13: warning: cannot prove argument meets declared bounds for 1st parameter [-Wcheck-bounds-decls-unchecked-scope]
  snprintf1(buf, len + 1, "Hello world");
            ^~~
snprintf_test.c:16:13: note: (expanded) expected argument bounds are 'bounds((char *)buf, (char *)buf + len + 1 - 1)'
snprintf_test.c:16:13: note: (expanded) inferred bounds are 'bounds(buf, buf + len)'
  snprintf1(buf, len + 1, "Hello world");
            ^~~

I thought the compiler was supposed to be able to handle addition and subtraction of constants, but I don't know the precise limitations. In any case, I imagine it would be easier to get the compiler to handle the constants than to get it to handle the conditional. One potential workaround is for the user to change their function interface to take the size including the null terminator:

void snprintf_test2(_Nt_array_ptr<char> buf : count(len-1), size_t len) {
  snprintf1(buf, len, "Hello world");  // OK
}

though this feels a little awkward in Checked C, and it will generate an analogous warning as before if snprintf_test2 is ever called from another function that takes a bound that doesn't include the null terminator and adds 1 when calling snprintf_test2.

There are plenty of other standard C functions that write a null-terminated string to a user-supplied buffer with a specified size that includes the null terminator (inet_ntop is one example I noticed while reviewing #448), so it would be valuable to have a general solution for them. For the most important functions (such as snprintf), it might be reasonable to define custom wrapper functions that take the length not including the null terminator. I tentatively proposed the name sbprintf for that analogue of snprintf; the b stands for "bound", since we judged scprintf (c for "count") to be too hard to read. (Of course, this would require having a place to put the function definitions: see the end of #449.) However, it probably won't be feasible to provide such wrappers for all standard C functions that take size parameters that include the null terminator.

@sulekhark
Copy link
Contributor

Thank you for the report.

  1. We are looking into validating the bounds in scenarios shown below (where there is a conditional) in the coming 3 to 4 months.
snprintf_test.c:13:12: note: (expanded) expected argument bounds are 'bounds((char *)buf, (char *)buf + len + 1 == 0 ? 0 : len + 1 - 1)'
snprintf_test.c:13:12: note: (expanded) inferred bounds are 'bounds(buf, buf + len)'
  1. At present, the compiler is not reasoning that len + 1 - 1 = len. We hope to fix this soon.

  2. I am working on Linking problems with out-of-line definitions for checkedc_extensions.h #449 for the defining snprintf_array_ptr.

  3. Yes, as you have pointed out, there are many functions that write a null-terminated string to a user-supplied buffer. As we see specific cases, we can think of solutions for the more common usage patterns. For inet_ntop, the (interface) return type can be _Nt_array_ptr<const char> with the default bounds of count(0) as it is in the port of musl to Checked C. The case of snprintf is more tricky - we need to think about it.

@mattmccutchen-cci
Copy link
Contributor Author

Thanks Sulekha for the update (on this and my other recent issues)!

For inet_ntop, the (interface) return type can be _Nt_array_ptr<const char> with the default bounds of count(0) as it is in the port of musl to Checked C.

I was talking about the char * parameter (named s in the musl port), not the return value. The size of the buffer (including the null terminator) is passed as a separate parameter, named l in the musl port. So this really is the exact same scenario as snprintf. The musl port currently declares s : itype(restrict _Nt_array_ptr<char>) count(l), i.e., the size is l excluding the null terminator. Thus, a call like the following:

void test() {
  char ip4_bin[4];
  char buf _Nt_checked[50];
  my_inet_ntop(AF_INET, ip4_bin, buf, sizeof buf);
}

fails to compile:

ntop_bound_test.c:11:34: error: argument does not meet declared bounds for 3rd parameter
  my_inet_ntop(AF_INET, ip4_bin, buf, sizeof buf);
                                 ^~~
ntop_bound_test.c:11:34: note: destination bounds are wider than the source bounds
ntop_bound_test.c:11:34: note: destination upper bound is above source upper bound
ntop_bound_test.c:11:34: note: (expanded) expected argument bounds are 'bounds((char *)buf, (char *)buf + (socklen_t)sizeof buf)'
ntop_bound_test.c:11:34: note: (expanded) inferred bounds are 'bounds(buf, buf + 49)'
  my_inet_ntop(AF_INET, ip4_bin, buf, sizeof buf);
                                 ^~~

To work around this, the caller would have to subtract 1 from sizeof buf.

I appreciate the work you're doing on better solutions. But what would you recommend that we do right now, e.g., for functions we're adding in #448? Copy and paste the complicated bound expression from snprintf to at least maintain consistency and document that the size includes the null terminator? (Would it be worth factoring out the bound expression into a macro in a shared Checked C header file? I don't know how introducing another header file might affect the tests.)

@sulekhark
Copy link
Contributor

Here's one proposal for something that we could do right now in order to proceed with porting. I will illustrate this approach for the snprintf function. The proposal is to define macros as follows:

_Unchecked
int snprintf(char * restrict s : itype(restrict _Nt_array_ptr<char>) count(n == 0 ? 0 : n-1),
                   size_t n,
                   const char * restrict format : itype(restrict _Nt_array_ptr<const char>), ...);

#define chkc1_snprintf(s, n, format, ...) snprintf(s, n, format, __VA_ARGS__)
#define chkc2_snprintf(s, n, format, ...) _Unchecked { snprintf((char *)s, n+1, format, __VA_ARGS__); }
#define chkc3_snprintf(s, n, format) snprintf(s, n, format)
#define chkc4_snprintf(s, n, format) _Unchecked { snprintf((char *)s, n+1, format); }

NOTE: The macros chkc3_snprintf and chkc4_snprintf are required because the previous two variadic macro definitions are not accepting zero variadic arguments.

Here is a test case that shows the two common ways in which the function snprintf could be called. Note that in each of the calling scenarios, the "convenient/natural" way that one would like to call snprintf is different.

#include <stdio.h>

void snprintf_test1() {
  char buf _Nt_checked[50];
  chkc1_snprintf(buf, 50, "Hello world - 1 - %d \n", 10);
  chkc3_snprintf(buf, 50, "Hello world - 1");
}

void snprintf_test2(_Nt_array_ptr<char> buf : count(len), size_t len) {
  chkc2_snprintf(buf, len, "Hello world - 2 - %d \n", 10);
  chkc4_snprintf(buf, len, "Hello world - 2");
}

The macros serve two purposes:

  1. hide the inconvenience of either doing a +1 or a -1 to reconcile the bounds of the first argument of type _Nt_array_ptr with its size, which is the second argument.
  2. enclose the calls to snprintf in an unchecked block temporarily until the compiler is able to reason about conditionals in bounds expressions.

Comments / Improvements / Alternatives are welcome.

@mattmccutchen-cci
Copy link
Contributor Author

mattmccutchen-cci commented May 28, 2021

Hi Sulekha,

I really appreciate your help in finding an interim solution so we can proceed with porting right away.

I propose we start with a simpler example function with no variable arguments, since variable arguments both cause issues for the macros and require all calls to be unchecked; we'll see that my final proposal below works for variable-argument functions. Here's my example code analogous to yours. I've provided an implementation for the function so we can do simple runtime tests.

#pragma CHECKED_SCOPE on

#include <stddef.h>

void sncopy(char * restrict s : itype(restrict _Nt_array_ptr<char>) count(n == 0 ? 0 : n-1),
            size_t n,
            const char * restrict src : itype(restrict _Nt_array_ptr<const char>)) {
  if (n == 0)
    return;
  size_t i = 0;
  _Nt_array_ptr<const char> src2 : count(i) =
    _Dynamic_bounds_cast<_Nt_array_ptr<const char>>(src, count(i));
  while (i < n - 1 && src2[i] != '\0') {
    s[i] = src2[i];
    i++;
  }
  while (i < n - 1) {
    s[i] = '\0';
    i++;
  }
  s[n - 1] = '\0';
}

#define chkc1_sncopy(s, n, src) sncopy(s, n, src)
#define chkc2_sncopy(s, n, src) _Unchecked { sncopy((char *)s, n+1, src); }

void sncopy_test1(void) {
  char buf _Nt_checked[50];
  chkc1_sncopy(buf, 50, "Hello world - 1");
}

void sncopy_test2(_Nt_array_ptr<char> buf : count(len), size_t len) {
  chkc2_sncopy(buf, len, "Hello world - 2");
}

This works, although I'd like chkc1_sncopy and chkc2_sncopy to have better names that remind the user of what they do. chkc1_sncopy is equivalent to sncopy, so we could just use sncopy directly. For chkc2_sncopy, we could use my earlier proposal of sbcopy (b for "bound").

However, I have two more fundamental concerns about the approach:

(1) The _Unchecked in chkc2_sncopy bypasses checking of the entire call expression. It doesn't check that the original bounds of s are correct; it also bypasses checking of argument expressions to sncopy that have nothing to do with the buffer, including their subexpressions. For example, each of the following functions causes a segmentation fault that is not caught by Checked C:

void sncopy_test2_bad1(_Nt_array_ptr<char> buf : count(len), size_t len) {
  chkc2_sncopy(buf, len, (char*)1);
}

void sncopy_test2_bad2(void) {
  char bad_buf _Checked[20];
  chkc2_sncopy(bad_buf, 100000000, "Hello world - 3");
}

Fortunately, this is easy to fix: instead of a macro, we can use a wrapper function that performs only the single needed _Assume_bounds_cast and leaves the rest of the checking in place:

void sbcopy(restrict _Nt_array_ptr<char> s : count(len),
            size_t len,
            restrict _Nt_array_ptr<const char> src) {
  size_t len2 = len + 1 == 0 ? 0 : len + 1 - 1;
  _Nt_array_ptr<char> s2 : count(len2) = 0;
  _Unchecked {
    s2 = _Assume_bounds_cast<_Nt_array_ptr<char>>(s, count(len2));
  }
  // The compiler issues a warning on the third argument that I think is wrong
  // and irrelevant to the present discussion.
  sncopy(s2, len + 1, src);
}

void sbcopy_test2(_Nt_array_ptr<char> buf : count(len), size_t len) {
  sbcopy(buf, len, "Hello world - 2");
}

(2) The approach still uses a separate wrapper macro or function for every system function that writes a null-terminated string to a buffer. I still think maintaining all of these wrappers is going to be an unreasonable amount of work. This can be solved by defining a helper function and macro that do just the bounds cast and can be used in combination with any system function:

_Nt_array_ptr<char> buf_plus_nt_bounds_cast(_Nt_array_ptr<char> buf : count(len), size_t len)
  : count(len + 1 == 0 ? 0 : len + 1 - 1) _Unchecked {
  return _Assume_bounds_cast<_Nt_array_ptr<char>>(buf, count(len + 1 == 0 ? 0 : len + 1 - 1));
}

#define BUF_PLUS_NT_ARGS(_buf, _len) buf_plus_nt_bounds_cast(_buf, _len), (_len) + 1

void sncopy_test2_better(_Nt_array_ptr<char> buf : count(len), size_t len) {
  sncopy(BUF_PLUS_NT_ARGS(buf, len), "Hello world - 2");
}

(BUF_PLUS_NT_ARGS assumes that the buffer and size are successive arguments to the system function. This is true for all the system functions I've seen so far, but if the user needs to use a function that isn't defined this way, they can use buf_plus_nt_bounds_cast directly.)

Does the BUF_PLUS_NT_ARGS approach look good to you? (I'm happy to use different names if you prefer.) If so, I'll go ahead and use it in #448. I just need to know where I can put the common helper definitions; I assume I should start a new header file for such things. What filename would you suggest, and what changes (if any) should I make to the tests to accommodate the new header file? (Ultimately, we'll also need a solution to #449, but I can go ahead and start testing things without it.)

As I previously mentioned at the end of #450 (comment), this would also be a good opportunity to factor out the count(n == 0 ? 0 : n-1) code pattern into a macro to avoid having to duplicate it in many checked declarations (and the associated churn if we need to change it in the future). Does the following look good?

#define nt_count_for_size(_size) count(_size == 0 ? 0 : _size - 1)

void sncopy(char * restrict s : itype(restrict _Nt_array_ptr<char>) nt_count_for_size(n),
            size_t n,
            const char * restrict src : itype(restrict _Nt_array_ptr<const char>));

@sulekhark
Copy link
Contributor

We discussed this issue internally. We think that there is something deeper going on here that we need to sort out. We will need some time to think about this.

@sulekhark
Copy link
Contributor

  1. We have modified the declaration of snprintf to simplify the bounds on the first parameter s, and impose the precondition n > 0 on the second parameter, n.
  2. The changes are in: PR Fixes for errors in the "array_ptr" interfaces to some libc functions declared in checkedc_extensions.h, and removal of the conditional in the snprintf declaration in stdio_checked.h #456 on the checkedc repository and PR #1086 on the checkedc-clang repository (this PR fixes a bug related to the where clause usage in the snprintf declaration). I am running the final ADO tests on these PRs.
  3. Under the new declaration of the snprintf function, the compiler still warns that it cannot prove that the first argument meets declared bounds for 1st parameter in calls such as snprintf(p, len + 1, format, arg) (i.e., it is unable to prove that p + len == p + len + 1 - 1). We plan to fix this soon. So, I suggest that we don't define a macro/function to workaround this issue.
  4. If you would like to propose a macro or any other mechanism to hide the "len + 1" that we need to pass to functions like snprintf, your suggestions are welcome.
  5. At present the compiler does not yet validate that arguments to function calls satisfy the preconditions placed on the corresponding parameters via where clauses in function declarations. But we would like to proceed forward with specifying preconditions on function parameters where necessary - we will take up validating such preconditions at call sites as soon as possible.

@mattmccutchen-cci
Copy link
Contributor Author

Thanks Sulekha. I'm studying your changes now, and I'd appreciate if you'd give me a few hours to raise any concerns before you merge the PR.

  1. Under the new declaration of the snprintf function, the compiler still warns that it cannot prove that the first argument meets declared bounds for 1st parameter in calls such as snprintf(p, len + 1, format, arg) (i.e., it is unable to prove that p + len == p + len + 1 - 1). We plan to fix this soon. So, I suggest that we don't define a macro/function to workaround this issue.

  2. If you would like to propose a macro or any other mechanism to hide the "len + 1" that we need to pass to functions like snprintf, your suggestions are welcome.

My proposal would still be BUF_PLUS_NT_ARGS as I described above. You say "We plan to fix this soon": how soon, realistically? I don't want #448 to be blocked indefinitely. So I'd propose to go ahead and add the BUF_PLUS_NT_ARGS mechanism to #448 for now, and if you do get the compiler fix done before #448 is merged, I'll remove BUF_PLUS_NT_ARGS from #448 again.

  1. At present the compiler does not yet validate that arguments to function calls satisfy the preconditions placed on the corresponding parameters via where clauses in function declarations. But we would like to proceed forward with specifying preconditions on function parameters where necessary - we will take up validating such preconditions at call sites as soon as possible.

Right, I see you have microsoft/checkedc-clang#218 open for this. But if we use where clauses in the checked declarations for existing C functions, what is your plan to avoid breaking compatibility with existing C code with implicit checked header inclusion? It would be silly for C code to pass a literal 0 as the size, but it might well pass a variable that isn't statically known to be positive (because of course, plain C code won't have a where clause), something like this:

int foo(char *buf, int size) {
  return snprintf(buf, size, /* some arguments */);
}

Is your plan to issue a warning (or no diagnostic at all) instead of an error if this occurs in an unchecked scope?

@sulekhark
Copy link
Contributor

My proposal would still be BUF_PLUS_NT_ARGS as I described above. You say "We plan to fix this soon": how soon, realistically? I don't want #448 to be blocked indefinitely. So I'd propose to go ahead and add the BUF_PLUS_NT_ARGS mechanism to #448 for now, and if you do get the compiler fix done before #448 is merged, I'll remove BUF_PLUS_NT_ARGS from #448 again.

We plan to take up this fix in a day or two and we should be done with the fix a week after we start. I really do not prefer to have the buf_plus_nt_bounds_cast wrapper. The macro could be:

#define NT_ARGS(_len) (_len) + 1

With the fix in place and the macro defined as above, we will also no longer have to define different macros if len does not follow buf in the argument list.

Right, I see you have microsoft/checkedc-clang#218 open for this. But if we use where clauses in the checked declarations for existing C functions, what is your plan to avoid breaking compatibility with existing C code with implicit checked header inclusion? It would be silly for C code to pass a literal 0 as the size, but it might well pass a variable that isn't statically known to be positive (because of course, plain C code won't have a where clause), something like this:

int foo(char *buf, int size) {
  return snprintf(buf, size, /* some arguments */);
}

Is your plan to issue a warning (or no diagnostic at all) instead of an error if this occurs in an unchecked scope?

Yes, the plan is to issue a warning instead of an error if this occurs in an unchecked scope.

@mattmccutchen-cci
Copy link
Contributor Author

mattmccutchen-cci commented Jun 7, 2021

Sounds like a good plan. I have no further concerns about #456. In #448, I'll plan to copy the new code pattern from snprintf to inet_ntop and so forth. The BUF_PLUS_NT_ARGS thing is independent and can be added later if appropriate. If the compiler fix for p + len + 1 - 1 ends up being delayed and you really don't want to offer the macro as an interim solution for users (because you don't want to have to keep it for compatibility once the compiler is fixed, or whatever), I guess CCI will put it in our own fork for use by anyone who wants a solution and can accept the compatibility break when it is ultimately removed.

Is there an existing issue that covers the p + len + 1 - 1 simplification? (Maybe the third bullet of microsoft/checkedc-clang#208, but that issue is very broad and it might be good to have a more specific one that can actually be tracked and declared fixed.) Should I submit one myself?

@sulekhark
Copy link
Contributor

Ok, thanks! We will file an issue and start work on the simplification of p + len + 1 - 1.

sulekhark added a commit that referenced this issue Jun 11, 2021
… declared in checkedc_extensions.h, and removal of the conditional in the snprintf declaration in stdio_checked.h (#456)

* Modified some of the Checked-C-specific declarations of libc functions
to address the issues raised in #499 and #450

* Missed adding the modified header files.

* Added the _Where clause to more closely resemble the snprintf declaration.

* Improvements to the test case.

* Incorporated review comments.
@mattmccutchen-cci
Copy link
Contributor Author

I see that microsoft/checkedc-clang#1088 has been filed for p + len + 1 - 1. I think this issue can now be closed because the remaining work is covered by other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants