Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Clang] Implement labelled type filtering for overflow/truncation sanitizers w/ SSCLs #107332

Merged
merged 13 commits into from
Nov 4, 2024

Conversation

JustinStitt
Copy link
Contributor

@JustinStitt JustinStitt commented Sep 4, 2024

Related RFC

Summary

Implement type-based filtering via Sanitizer Special Case Lists for the arithmetic overflow and truncation sanitizers.

Currently, using the type: prefix with these sanitizers does nothing. I've hooked up the SSCL parsing with Clang codegen so that we don't emit the overflow/truncation checks if the arithmetic contains an ignored type.

Usefulness

You can craft ignorelists that ignore specific types that are expected to overflow or wrap-around. For example, to ignore my_type from unsigned-integer-overflow instrumentation:

$ cat ignorelist.txt
[unsigned-integer-overflow]
type:my_type=no_sanitize

$ cat foo.c
typedef unsigned long my_type;
void foo() {
  my_type a = ULONG_MAX;
  ++a;
}

$ clang foo.c -fsanitize=unsigned-integer-overflow -fsanitize-ignorelist=ignorelist.txt ; ./a.out
// --> no sanitizer error

If a type is functionally intended to overflow, like refcount_t and its associated APIs in the Linux kernel, then this type filtering would prove useful for reducing sanitizer noise. Currently, the Linux kernel dealt with this by littering __attribute__((no_sanitize("signed-integer-overflow"))) annotations on all the refcount_t APIs. I think this serves as an example of how a codebase could be made cleaner. We could make custom types that are filtered out in an ignorelist, allowing for types to be more expressive -- without the need for annotations. This accomplishes a similar goal to #86618.

Yet another use case for this type filtering is whitelisting. We could ignore all types, save a few.

$ cat ignorelist.txt
[implicit-signed-integer-truncation]
type:*=no_sanitize # ignore literally all types
type:short=sanitize # except `short`

$ cat bar.c
// compile with -fsanitize=implicit-signed-integer-truncation
void bar(int toobig) {
  char a = toobig;  // not instrumented
  short b = toobig; // instrumented
}

Other ways to accomplish the goal of sanitizer allowlisting/whitelisting

  • ignore list SSCL type support (this PR that you're reading)
  • my sanitize-allowlist branch - this just implements a sibling flag -fsanitize-allowlist=, removing some of the double negative logic present with skip/ignore when trying to whitelist something.
  • Glob Negation - Implement a negation operator to the GlobPattern class so the ignorelist query can use them to simulate allowlisting

Please let me know which of the three options we like best. They are not necessarily mutually exclusive.

Here's another related PR which implements a wraps attribute. This can accomplish a similar goal to this PR but requires in-source changes to codebases and also covers a wider variety of integer definedness problems.

CCs

@kees @vitalybuka @bwendling

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen labels Sep 4, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 4, 2024

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Justin Stitt (JustinStitt)

Changes

Related RFC

Summary

Implement type-based filtering via Sanitizer Special Case Lists for the arithmetic overflow and truncation sanitizers.

Currently, using the type: prefix with these sanitizers does nothing. I've hooked up the SSCL parsing with Clang codegen so that we don't emit the overflow/truncation checks if the arithmetic contains an ignored type.

Usefulness

You can craft ignorelists that ignore specific types that are expected to overflow or wrap-around. For example, to ignore my_type from unsigned-integer-overflow instrumentation:

$ cat ignorelist.txt
[unsigned-integer-overflow]
type:my_type

$ cat foo.c
typedef int my_type;
void foo() {
  my_type a = 2147483647;
  ++a;
}

$ clang foo.c -fsanitize=signed-integer-overflow -fsanitize-ignorelist=ignorelist.txt ; ./a.out
// --> no sanitizer error

If a type is functionally intended to overflow, like refcount_t and its associated APIs in the Linux kernel, then this type filtering would prove useful for reducing sanitizer noise. Currently, the Linux kernel dealt with this by littering __attribute__((no_sanitize("signed-integer-overflow"))) annotations on all the refcount_t APIs. I think this serves as an example of how a codebase could be made cleaner. We could make custom types that are filtered out in an ignorelist, allowing for types to be more expressive -- without the need for annotations. This accomplishes a similar goal to #86618.

Yet another use case for this type filtering is whitelisting. We could ignore all types, save a few.

$ cat ignorelist.txt
[implicit-signed-integer-truncation]
type:* # ignore literally all types
type:short=skip # except `short`

$ cat bar.c
// compile with -fsanitize=implicit-signed-integer-truncation
void bar(int toobig) {
  char a = toobig;  // not instrumented
  short b = toobig; // instrumented
}

Other ways to accomplish the goal of sanitizer allowlisting/whitelisting

  • ignore list SSCL type support (this PR that you're reading)
  • my sanitize-allowlist branch - this just implements a sibling flag -fsanitize-allowlist=, removing some of the double negative logic present with skip/ignore when trying to whitelist something.
  • Glob Negation - Implement a negation operator to the GlobPattern class so the ignorelist query can use them to simulate allowlisting

Please let me know which of the three options we like best. They are not necessarily mutually exclusive.

Here's another related PR which implements a wraps attribute. This can accomplish a similar goal to this PR but requires in-source changes to codebases and also covers a wider variety of integer definedness problems.

CCs

@kees @vitalybuka @bwendling


Full diff: https://github.com/llvm/llvm-project/pull/107332.diff

6 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+8)
  • (modified) clang/docs/SanitizerSpecialCaseList.rst (+60-2)
  • (modified) clang/include/clang/AST/ASTContext.h (+3)
  • (modified) clang/lib/AST/ASTContext.cpp (+31)
  • (modified) clang/lib/CodeGen/CGExprScalar.cpp (+32-4)
  • (modified) clang/test/CodeGen/ubsan-type-ignorelist.cpp (+76-8)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 0eee71d00a2c5f..0f036e90e06c5e 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -540,6 +540,14 @@ Sanitizers
   This new flag should allow those projects to enable integer sanitizers with
   less noise.
 
+- Arithmetic overflow sanitizers ``-fsanitize=signed-integer-overflow`` and
+  ``-fsanitize=unsigned-integer-overflow`` as well as the implicit integer
+  truncation sanitizers ``-fsanitize=implicit-signed-integer-truncation`` and
+  ``-fsanitize=implicit-unsigned-integer-truncation`` now properly support the
+  "type" prefix within `Sanitizer Special Case Lists (SSCL)
+  <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html>`_. See that link
+  for examples.
+
 Python Binding Changes
 ----------------------
 - Fixed an issue that led to crashes when calling ``Type.get_exception_specification_kind``.
diff --git a/clang/docs/SanitizerSpecialCaseList.rst b/clang/docs/SanitizerSpecialCaseList.rst
index c7fb0fa3f8a828..13f71b878d4cc8 100644
--- a/clang/docs/SanitizerSpecialCaseList.rst
+++ b/clang/docs/SanitizerSpecialCaseList.rst
@@ -15,8 +15,9 @@ file at compile-time.
 Goal and usage
 ==============
 
-Users of sanitizer tools, such as :doc:`AddressSanitizer`, :doc:`ThreadSanitizer`
-or :doc:`MemorySanitizer` may want to disable or alter some checks for
+Users of sanitizer tools, such as :doc:`AddressSanitizer`,
+:doc:`ThreadSanitizer`, :doc:`MemorySanitizer` or :doc:
+`UndefinedBehaviorSanitizer` may want to disable or alter some checks for
 certain source-level entities to:
 
 * speedup hot function, which is known to be correct;
@@ -48,6 +49,63 @@ Example
   $ clang -fsanitize=address -fsanitize-ignorelist=ignorelist.txt foo.c ; ./a.out
   # No error report here.
 
+Usage with UndefinedBehaviorSanitizer
+=====================================
+
+The arithmetic overflow sanitizers ``unsigned-integer-overflow`` and
+``signed-integer-overflow`` as well as the implicit integer truncation
+sanitizers ``implicit-signed-integer-truncation`` and
+``implicit-unsigned-integer-truncation`` support the ability to adjust
+instrumentation based on type.
+
+.. code-block:: bash
+
+  $ cat foo.c
+  void foo() {
+    int a = 2147483647; // INT_MAX
+    ++a;                // Normally, an overflow with -fsanitize=signed-integer-overflow
+  }
+  $ cat ignorelist.txt
+  [signed-integer-overflow]
+  type:int
+  $ clang -fsanitize=signed-integer-overflow -fsanitize-ignorelist=ignorelist.txt foo.c ; ./a.out
+  # no signed-integer-overflow error
+
+Supplying ``ignorelist.txt`` with ``-fsanitize-ignorelist=ignorelist.txt``
+disables overflow sanitizer instrumentation for arithmetic operations
+containing values of type ``int``, for example. Custom types may be used.
+
+The following SCL categories are supported: ``=allow``, ``=skip`` and
+``=forbid``. The ``allow`` category is the default for any entry and specifies
+that the query, if matched, will have its sanitizer instrumentation ignored.
+Conversely, both ``skip`` and ``forbid`` cause their queries, if matched, to be
+left out of the ignorelist -- essentially ensuring sanitizer instrumentation
+remains for those types. This is useful for whitelisting specific types.
+
+With this, one may disable instrumentation for all types and specifically allow
+instrumentation for one or many types.
+
+.. code-block:: bash
+
+  $ cat ignorelist.txt
+  [implicit-signed-integer-truncation]
+  type:*=allow
+  type:T=skip
+  $ cat foo.c
+  typedef char T;
+  typedef char U;
+  void foo(int toobig) {
+    T a = toobig;    // instrumented
+    U b = toobig;    // not instrumented
+    char c = toobig; // also not instrumented
+  }
+
+Note that ``skip`` and ``forbid`` operate exactly the same in this context and
+both options exist simply for conformity with the `-fprofile-list
+<https://clang.llvm.org/docs/UsersManual.html#instrumenting-only-selected-files-or-functions>`_
+syntax for adjusting profile instrumentation. You do not need to specify any
+`default:<type>` for ``-fsanitize-ignorelist`` SSCLs, though.
+
 Format
 ======
 
diff --git a/clang/include/clang/AST/ASTContext.h b/clang/include/clang/AST/ASTContext.h
index 89bb5768dbd40d..a9373c6102c010 100644
--- a/clang/include/clang/AST/ASTContext.h
+++ b/clang/include/clang/AST/ASTContext.h
@@ -807,6 +807,9 @@ class ASTContext : public RefCountedBase<ASTContext> {
 
   const NoSanitizeList &getNoSanitizeList() const { return *NoSanitizeL; }
 
+  bool isTypeIgnoredBySanitizer(const SanitizerMask &Mask,
+                                const QualType &Ty) const;
+
   const XRayFunctionFilter &getXRayFilter() const {
     return *XRayFilter;
   }
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index c61234aa4d1af1..bcda5fad585d67 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -817,6 +817,37 @@ ASTContext::getCanonicalTemplateTemplateParmDecl(
   return CanonTTP;
 }
 
+/// Check if a type can have its sanitizer instrumentation elided.
+/// Determine this by its presence in a SCL alongside its specified categories.
+/// For example:
+/// ignorelist.txt>
+/// [{unsigned-integer-overflow,signed-integer-overflow}]
+/// type:*
+/// type:size_t=skip
+/// <ignorelist.txt
+/// Supplying the above ignorelist.txt will disable overflow sanitizer
+/// instrumentation for all types except "size_t".
+bool ASTContext::isTypeIgnoredBySanitizer(const SanitizerMask &Mask,
+                                          const QualType &Ty) const {
+  // One may specifically allow a type "type:foo=allow"
+  bool isAllowedBySCL =
+      NoSanitizeL->containsType(Mask, Ty.getAsString(), "allow");
+
+  // There could also be no category present "type:foo", which is the same as
+  // "allow"
+  isAllowedBySCL |= NoSanitizeL->containsType(Mask, Ty.getAsString());
+
+  // Explicitly specifying "skip" is also possible "type:foo=skip"
+  bool isSkippedBySCL =
+      NoSanitizeL->containsType(Mask, Ty.getAsString(), "skip");
+
+  // Or "forbid", as there is currently no distinction between "skip" and
+  // "forbid" for the purposes of the overflow/truncation sanitizer ignorelist.
+  isSkippedBySCL |= NoSanitizeL->containsType(Mask, Ty.getAsString(), "forbid");
+
+  return isAllowedBySCL && !isSkippedBySCL;
+}
+
 TargetCXXABI::Kind ASTContext::getCXXABIKind() const {
   auto Kind = getTargetInfo().getCXXABI().getKind();
   return getLangOpts().CXXABI.value_or(Kind);
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index af11bc20a3b639..21e932a3639f73 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -196,6 +196,18 @@ static bool CanElideOverflowCheck(const ASTContext &Ctx, const BinOpInfo &Op) {
   if (!Op.mayHaveIntegerOverflow())
     return true;
 
+  if (Op.Ty->isSignedIntegerType() &&
+      Ctx.isTypeIgnoredBySanitizer(SanitizerKind::SignedIntegerOverflow,
+                                   Op.Ty)) {
+    return true;
+  }
+
+  if (Op.Ty->isUnsignedIntegerType() &&
+      Ctx.isTypeIgnoredBySanitizer(SanitizerKind::UnsignedIntegerOverflow,
+                                   Op.Ty)) {
+    return true;
+  }
+
   const UnaryOperator *UO = dyn_cast<UnaryOperator>(Op.E);
 
   if (UO && UO->getOpcode() == UO_Minus &&
@@ -1120,6 +1132,10 @@ void ScalarExprEmitter::EmitIntegerTruncationCheck(Value *Src, QualType SrcType,
   if (!CGF.SanOpts.has(Check.second.second))
     return;
 
+  // Does some SSCL ignore this type?
+  if (CGF.getContext().isTypeIgnoredBySanitizer(Check.second.second, DstType))
+    return;
+
   llvm::Constant *StaticArgs[] = {
       CGF.EmitCheckSourceLocation(Loc), CGF.EmitCheckTypeDescriptor(SrcType),
       CGF.EmitCheckTypeDescriptor(DstType),
@@ -1230,6 +1246,15 @@ void ScalarExprEmitter::EmitIntegerSignChangeCheck(Value *Src, QualType SrcType,
     // Because here sign change check is interchangeable with truncation check.
     return;
   }
+  // Does an SSCL have an entry for the DstType under its respective sanitizer
+  // section?
+  if (DstSigned && CGF.getContext().isTypeIgnoredBySanitizer(
+                       SanitizerKind::ImplicitSignedIntegerTruncation, DstType))
+    return;
+  if (!DstSigned &&
+      CGF.getContext().isTypeIgnoredBySanitizer(
+          SanitizerKind::ImplicitUnsignedIntegerTruncation, DstType))
+    return;
   // That's it. We can't rule out any more cases with the data we have.
 
   CodeGenFunction::SanitizerScope SanScope(&CGF);
@@ -2770,10 +2795,11 @@ llvm::Value *ScalarExprEmitter::EmitIncDecConsiderOverflowBehavior(
       return Builder.CreateNSWAdd(InVal, Amount, Name);
     [[fallthrough]];
   case LangOptions::SOB_Trapping:
-    if (!E->canOverflow())
+    BinOpInfo Info = createBinOpInfoFromIncDec(
+        E, InVal, IsInc, E->getFPFeaturesInEffect(CGF.getLangOpts()));
+    if (!E->canOverflow() || CanElideOverflowCheck(CGF.getContext(), Info))
       return Builder.CreateNSWAdd(InVal, Amount, Name);
-    return EmitOverflowCheckedBinOp(createBinOpInfoFromIncDec(
-        E, InVal, IsInc, E->getFPFeaturesInEffect(CGF.getLangOpts())));
+    return EmitOverflowCheckedBinOp(Info);
   }
   llvm_unreachable("Unknown SignedOverflowBehaviorTy");
 }
@@ -2973,7 +2999,9 @@ ScalarExprEmitter::EmitScalarPrePostIncDec(const UnaryOperator *E, LValue LV,
       value = EmitIncDecConsiderOverflowBehavior(E, value, isInc);
     } else if (E->canOverflow() && type->isUnsignedIntegerType() &&
                CGF.SanOpts.has(SanitizerKind::UnsignedIntegerOverflow) &&
-               !excludeOverflowPattern) {
+               !excludeOverflowPattern &&
+               !CGF.getContext().isTypeIgnoredBySanitizer(
+                   SanitizerKind::UnsignedIntegerOverflow, E->getType())) {
       value = EmitOverflowCheckedBinOp(createBinOpInfoFromIncDec(
           E, value, isInc, E->getFPFeaturesInEffect(CGF.getLangOpts())));
     } else {
diff --git a/clang/test/CodeGen/ubsan-type-ignorelist.cpp b/clang/test/CodeGen/ubsan-type-ignorelist.cpp
index 7bea84f5fabb93..e8b14d5c415222 100644
--- a/clang/test/CodeGen/ubsan-type-ignorelist.cpp
+++ b/clang/test/CodeGen/ubsan-type-ignorelist.cpp
@@ -1,7 +1,28 @@
+// Verify ubsan doesn't emit checks for ignorelisted types
+// RUN: echo "[{unsigned-integer-overflow,signed-integer-overflow}]" > %t-int.ignorelist
+// RUN: echo "type:int" >> %t-int.ignorelist
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow,unsigned-integer-overflow -fsanitize-ignorelist=%t-int.ignorelist -emit-llvm %s -o - | FileCheck %s --check-prefix=INT
+
+// RUN: echo "type:int" > %t-nosection.ignorelist
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow,unsigned-integer-overflow -fsanitize-ignorelist=%t-nosection.ignorelist -emit-llvm %s -o - | FileCheck %s --check-prefix=INT
+
+// RUN: echo "type:int=allow" > %t-allow-same-as-no-category.ignorelist
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow,unsigned-integer-overflow -fsanitize-ignorelist=%t-allow-same-as-no-category.ignorelist -emit-llvm %s -o - | FileCheck %s --check-prefix=INT
+
+// RUN: echo "[{unsigned-integer-overflow,signed-integer-overflow}]" > %t-myty.ignorelist
+// RUN: echo "type:*" >> %t-myty.ignorelist
+// RUN: echo "type:myty=skip" >> %t-myty.ignorelist
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=signed-integer-overflow,unsigned-integer-overflow -fsanitize-ignorelist=%t-myty.ignorelist -emit-llvm %s -o - | FileCheck %s --check-prefix=MYTY
+
+// RUN: echo "[{implicit-signed-integer-truncation,implicit-unsigned-integer-truncation}]" > %t-trunc.ignorelist
+// RUN: echo "type:char" >> %t-trunc.ignorelist
+// RUN: echo "type:unsigned char" >> %t-trunc.ignorelist
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=implicit-signed-integer-truncation,implicit-unsigned-integer-truncation -fsanitize-ignorelist=%t-trunc.ignorelist -emit-llvm %s -o - | FileCheck %s --check-prefix=TRUNC
+
 // Verify ubsan vptr does not check down-casts on ignorelisted types.
 // RUN: echo "type:_ZTI3Foo" > %t-type.ignorelist
-// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=vptr -fsanitize-recover=vptr -emit-llvm %s -o - | FileCheck %s --check-prefix=DEFAULT
-// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=vptr -fsanitize-recover=vptr -fsanitize-ignorelist=%t-type.ignorelist -emit-llvm %s -o - | FileCheck %s --check-prefix=TYPE
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=vptr -fsanitize-recover=vptr -emit-llvm %s -o - | FileCheck %s --check-prefix=VPTR
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -fsanitize=vptr -fsanitize-recover=vptr -fsanitize-ignorelist=%t-type.ignorelist -emit-llvm %s -o - | FileCheck %s --check-prefix=VPTR-TYPE
 
 class Bar {
 public:
@@ -11,13 +32,60 @@ class Foo : public Bar {};
 
 Bar bar;
 
-// DEFAULT: @_Z7checkmev
-// TYPE: @_Z7checkmev
+// VPTR: @_Z7checkmev
+// VPTR-TYPE: @_Z7checkmev
 void checkme() {
-// DEFAULT: call void @__ubsan_handle_dynamic_type_cache_miss({{.*}} (ptr @bar to
-// TYPE-NOT: @__ubsan_handle_dynamic_type_cache_miss
+// VPTR: call void @__ubsan_handle_dynamic_type_cache_miss({{.*}} (ptr @bar to
+// VPTR-TYPE-NOT: @__ubsan_handle_dynamic_type_cache_miss
   Foo* foo = static_cast<Foo*>(&bar); // down-casting
-// DEFAULT: ret void
-// TYPE: ret void
+// VPTR: ret void
+// VPTR-TYPE: ret void
   return;
 }
+
+// INT-LABEL: ignore_int
+void ignore_int(int A, int B, unsigned C, unsigned D, long E) {
+// INT: llvm.uadd.with.overflow.i32
+  (void)(C+D);
+// INT-NOT: llvm.sadd.with.overflow.i32
+  (void)(A+B);
+// INT: llvm.sadd.with.overflow.i64
+  (void)(++E);
+}
+
+
+typedef unsigned long myty;
+typedef myty derivative;
+// INT-LABEL: ignore_all_except_myty
+// MYTY-LABEL: ignore_all_except_myty
+void ignore_all_except_myty(myty A, myty B, int C, unsigned D, derivative E) {
+// MYTY-NOT: llvm.sadd.with.overflow.i32
+  (void)(++C);
+
+// MYTY-NOT: llvm.uadd.with.overflow.i32
+  (void)(D+D);
+
+// MYTY-NOT: llvm.umul.with.overflow.i64
+  (void)(E*2);
+
+// MYTY: llvm.uadd.with.overflow.i64
+  (void)(A+B);
+}
+
+// INT-LABEL: truncation
+// MYTY-LABEL: truncation
+// TRUNC-LABEL: truncation
+void truncation(char A, int B, unsigned char C, short D) {
+// TRUNC-NOT: %handler.implicit_conversion
+  A = B;
+// TRUNC-NOT: %handler.implicit_conversion
+  A = C;
+// TRUNC-NOT: %handler.implicit_conversion
+  C = B;
+
+// TRUNC: %handler.implicit_conversion
+  D = B;
+
+  (void)A;
+  (void)D;
+}

@JustinStitt JustinStitt changed the title [Clang] Implement type filtering for overflow/truncation sanitizers w/ SSCLs [Clang] Implement labelled type filtering for overflow/truncation sanitizers w/ SSCLs Sep 16, 2024
clang/docs/SanitizerSpecialCaseList.rst Outdated Show resolved Hide resolved
clang/lib/AST/ASTContext.cpp Outdated Show resolved Hide resolved
clang/test/CodeGen/ubsan-type-ignorelist.cpp Outdated Show resolved Hide resolved
@JustinStitt

This comment was marked as outdated.

@kees kees requested a review from melver October 30, 2024 17:43
@melver
Copy link
Contributor

melver commented Oct 30, 2024

We could make custom types that are filtered out in an ignorelist, allowing for types to be more expressive -- without the need for annotations

Having this sort of semantic information detached from the code may cause some maintenance headaches. For example, if the type is renamed, developers have to remember to adjust the filter list as well.

I wished that we could just attach attributes to type, e.g. typedef int __attribute__((no_sanitize("signed-integer-overflow")) wrapping_int or something. One thing here is that this should not be a type modifier (like volatile and such), so it does not change the type, but that means nothing enforces it's propagated through conversions. But I suppose that the filter list has the same problem as you depend on specific type names.

Just something to consider - I suspect it's much more complex to implement in Clang though. So that's a +1 for the filterlist in terms of complexity.

clang/docs/SanitizerSpecialCaseList.rst Outdated Show resolved Hide resolved
@JustinStitt
Copy link
Contributor Author

I wished that we could just attach attributes to type, e.g. typedef int __attribute__((no_sanitize("signed-integer-overflow")) wrapping_int or something. One thing here is that this should not be a type modifier (like volatile and such), so it does not change the type, but that means nothing enforces it's propagated through conversions. But I suppose that the filter list has the same problem as you depend on specific type names.

I have the implementation done for this and I should get the PR up soon.

Then we'll have something like this:

typedef int __attribute__((wraps)) wrapping_int

I'll be sure to CC you 😄

@melver
Copy link
Contributor

melver commented Oct 30, 2024

I wished that we could just attach attributes to type, e.g. typedef int __attribute__((no_sanitize("signed-integer-overflow")) wrapping_int or something. One thing here is that this should not be a type modifier (like volatile and such), so it does not change the type, but that means nothing enforces it's propagated through conversions. But I suppose that the filter list has the same problem as you depend on specific type names.

I have the implementation done for this and I should get the PR up soon.

Then we'll have something like this:

typedef int __attribute__((wraps)) wrapping_int

I'll be sure to CC you 😄

Oh nice. If you have that almost done, why do you still want the filterlist? For the Linux kernel usecase, wouldn't the attribute be preferrable? My intuition tells me that maintaining a filterlist detached from the code will cause headaches, esp. during refactors or code moves (and you don't want to be the one chasing after the breakages).

@vitalybuka
Copy link
Collaborator

vitalybuka commented Oct 30, 2024

@vitalybuka Can we move forward with this? The most recent commit uses the =sanitize =no_sanitize wording that you prefer.

yes, we can
you should press "re-request review" after update, or your PR does not show up in "review-requested:@me" searches.

@vitalybuka
Copy link
Collaborator

Is "Test documentation build" error related?

Copy link
Collaborator

@vitalybuka vitalybuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't forget to "re-request" after update, fell free to ping me in chat as well.

Copy link
Collaborator

@vitalybuka vitalybuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually LGTM,
Request for doc and test is quite straightforward. No need to block on me.

Signed-off-by: Justin Stitt <[email protected]>
@JustinStitt
Copy link
Contributor Author

@melver

Oh nice. If you have that almost done, why do you still want the filterlist? For the Linux kernel usecase, wouldn't the attribute be preferrable? My intuition tells me that maintaining a filterlist detached from the code will cause headaches, esp. during refactors or code moves (and you don't want to be the one chasing after the breakages).

We need both the filterlist and attribute support because of the agreed upon ideology about how integer overflow is reasoned about in the Linux kernel.

Basically, the compromise that was reached involves marking all types as wrapping so that the sanitizers won't warn for just any arithmetic. Following this, specific types (like size_t) will be marked as no_wraps which means the sanitizers will do their thing. The best way to "mark all types" is with filterlists; and the best way to mark one (or a few) types is with in-source attributes.

I know @kees feels strongly that this should be the other way around: Instrument everything by default (which would eliminate the need for filterlist integration entirely) and exclude things from instrumentation with wraps. Either way, my patch set implements both wraps and no_wraps and folks can use whichever they please. Using wraps/no_wraps will not force users to interact with or use ignorelists at all -- those attributes will simply integrate nicely with ignorelists if supplied.

@JustinStitt

This comment was marked as resolved.

@JustinStitt

This comment was marked as resolved.

@vitalybuka
Copy link
Collaborator

I know @kees feels strongly that this should be the other way around: Instrument everything by default (which would eliminate the need for filterlist integration entirely) and exclude things from instrumentation with wraps.

My intuition is that @kees approach could be more efficient to get things done :)
Maintaining a list of types can be annoying.

@JustinStitt
Copy link
Contributor Author

My intuition is that @kees approach could be more efficient to get things done :) Maintaining a list of types can be annoying.

We discussed this at Linux Plumbers '24 and also on lkml where an eventual lwn.net article was published summarizing the discussion.

I recommend reading that article for the most succinct overview of Linus' thoughts on the matter. From our talk at Plumbers, other maintainers agree with Linus: we should limit sanitizer instrumentation across all types and simply whitelist a few.

Thankfully, this PR and my other PR (WIP) leave the door open. You can ignore all types and then mark a few as no_wraps or you can sanitize all types (the default) and mark a few as wraps.

@kees
Copy link
Contributor

kees commented Nov 1, 2024

Thankfully, this PR and my other PR (WIP) leave the door open. You can ignore all types and then mark a few as no_wraps or you can sanitize all types (the default) and mark a few as wraps.

Right, while I want to go full instrumentation, it's just not going to happen in the Linux kernel. We're going to need both options so that we can slowly overlap coverage until we've squeezed out all the unexpected wrap-around.

@vitalybuka
Copy link
Collaborator

vitalybuka commented Nov 1, 2024

Right, while I want to go full instrumentation, it's just not going to happen in the Linux kernel. We're going to need both options so that we can slowly overlap coverage until we've squeezed out all the unexpected wrap-around.

That was just my 2c. Linux decision about is need needed to proceed here.
The patch LGTM.

I believe it could be better without no_sanitize. Please share if some one else have opinion on that?
Also let's ensure consistently between no_wrap/no_sanitize.

BTW, what is the plan for wrap? Is this going to be this list or another one?
Does Kernel instrument in recovery mode or always crash?
In recoverable mode sanitizer/no_sanitize is independent from wrap/no_wrap.
In non-recovery mode wrap is not needed for sanitized types

Signed-off-by: Justin Stitt <[email protected]>
@vitalybuka
Copy link
Collaborator

Thanks!

@vitalybuka vitalybuka merged commit 3dd1d88 into llvm:main Nov 4, 2024
9 checks passed
@vitalybuka
Copy link
Collaborator

@JustinStitt Can't assign you as reviewer, but please take a look #114754

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 4, 2024

LLVM Buildbot has detected a new failure on builder lldb-remote-linux-ubuntu running on as-builder-9 while building clang at step 16 "test-check-lldb-api".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/195/builds/556

Here is the relevant piece of the build log for the reference
Step 16 (test-check-lldb-api) failure: Test just built components: check-lldb-api completed (failure)
...
PASS: lldb-api :: types/TestIntegerType.py (1199 of 1208)
PASS: lldb-api :: types/TestIntegerTypeExpr.py (1200 of 1208)
PASS: lldb-api :: types/TestRecursiveTypes.py (1201 of 1208)
PASS: lldb-api :: types/TestShortType.py (1202 of 1208)
PASS: lldb-api :: types/TestShortTypeExpr.py (1203 of 1208)
PASS: lldb-api :: types/TestLongTypes.py (1204 of 1208)
PASS: lldb-api :: types/TestLongTypesExpr.py (1205 of 1208)
PASS: lldb-api :: tools/lldb-server/TestNonStop.py (1206 of 1208)
PASS: lldb-api :: tools/lldb-server/TestLldbGdbServer.py (1207 of 1208)
UNRESOLVED: lldb-api :: tools/lldb-server/TestGdbRemoteProcessInfo.py (1208 of 1208)
******************** TEST 'lldb-api :: tools/lldb-server/TestGdbRemoteProcessInfo.py' FAILED ********************
Script:
--
/usr/bin/python3.12 /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin --libcxx-include-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include/c++/v1 --libcxx-include-target-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include/aarch64-unknown-linux-gnu/c++/v1 --libcxx-library-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib/aarch64-unknown-linux-gnu --arch aarch64 --build-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin/lldb --compiler /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/bin/clang --dsymutil /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin/dsymutil --make /usr/bin/make --llvm-tools-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin --lldb-obj-root /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/tools/lldb --lldb-libs-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib --platform-url connect://jetson-agx-2198.lab.llvm.org:1234 --platform-working-dir /home/ubuntu/lldb-tests --sysroot /mnt/fs/jetson-agx-ubuntu --env ARCH_CFLAGS=-mcpu=cortex-a78 --platform-name remote-linux /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/test/API/tools/lldb-server -p TestGdbRemoteProcessInfo.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 20.0.0git (https://github.com/llvm/llvm-project.git revision 3dd1d888fb18b40859504e207e57abdc6c43d838)
  clang revision 3dd1d888fb18b40859504e207e57abdc6c43d838
  llvm revision 3dd1d888fb18b40859504e207e57abdc6c43d838
Setting up remote platform 'remote-linux'
Connecting to remote platform 'remote-linux' at 'connect://jetson-agx-2198.lab.llvm.org:1234'...
Connected.
Setting remote platform working directory to '/home/ubuntu/lldb-tests'...
Skipping the following test categories: ['dsym', 'gmodules', 'debugserver', 'objc', 'lldb-dap']
connect to debug monitor on port 12867 failed, attempt #1 of 20
connect to debug monitor on port 12657 failed, attempt #2 of 20
connect to debug monitor on port 15682 failed, attempt #3 of 20
connect to debug monitor on port 12047 failed, attempt #4 of 20
connect to debug monitor on port 13012 failed, attempt #5 of 20
connect to debug monitor on port 12726 failed, attempt #6 of 20
connect to debug monitor on port 13327 failed, attempt #7 of 20
connect to debug monitor on port 12066 failed, attempt #8 of 20
connect to debug monitor on port 12055 failed, attempt #9 of 20
connect to debug monitor on port 14880 failed, attempt #10 of 20
connect to debug monitor on port 14710 failed, attempt #11 of 20
connect to debug monitor on port 14229 failed, attempt #12 of 20
connect to debug monitor on port 12882 failed, attempt #13 of 20
connect to debug monitor on port 14594 failed, attempt #14 of 20
connect to debug monitor on port 13140 failed, attempt #15 of 20
connect to debug monitor on port 13353 failed, attempt #16 of 20
connect to debug monitor on port 14585 failed, attempt #17 of 20
connect to debug monitor on port 12057 failed, attempt #18 of 20
connect to debug monitor on port 13687 failed, attempt #19 of 20
connect to debug monitor on port 12328 failed, attempt #20 of 20

--

@JustinStitt
Copy link
Contributor Author

@vitalybuka any idea how lldb-server is failing with UNRESOLVED? How in the world are any of the changes here affecting lldb-server?

vitalybuka added a commit that referenced this pull request Nov 5, 2024
PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024
…itizers w/ SSCLs (llvm#107332)

[Related
RFC](https://discourse.llvm.org/t/rfc-support-globpattern-add-operator-to-invert-matches/80683/5?u=justinstitt)

### Summary

Implement type-based filtering via [Sanitizer Special Case
Lists](https://clang.llvm.org/docs/SanitizerSpecialCaseList.html) for
the arithmetic overflow and truncation sanitizers.

Currently, using the `type:` prefix with these sanitizers does nothing.
I've hooked up the SSCL parsing with Clang codegen so that we don't emit
the overflow/truncation checks if the arithmetic contains an ignored
type.

### Usefulness

You can craft ignorelists that ignore specific types that are expected
to overflow or wrap-around. For example, to ignore `my_type` from
`unsigned-integer-overflow` instrumentation:
```bash
$ cat ignorelist.txt
[unsigned-integer-overflow]
type:my_type=no_sanitize

$ cat foo.c
typedef unsigned long my_type;
void foo() {
  my_type a = ULONG_MAX;
  ++a;
}

$ clang foo.c -fsanitize=unsigned-integer-overflow -fsanitize-ignorelist=ignorelist.txt ; ./a.out
// --> no sanitizer error
```

If a type is functionally intended to overflow, like
[refcount_t](https://kernsec.org/wiki/index.php/Kernel_Protections/refcount_t)
and its associated APIs in the Linux kernel, then this type filtering
would prove useful for reducing sanitizer noise. Currently, the Linux
kernel dealt with this by
[littering](https://elixir.bootlin.com/linux/v6.10.8/source/include/linux/refcount.h#L139
) `__attribute__((no_sanitize("signed-integer-overflow")))` annotations
on all the `refcount_t` APIs. I think this serves as an example of how a
codebase could be made cleaner. We could make custom types that are
filtered out in an ignorelist, allowing for types to be more expressive
-- without the need for annotations. This accomplishes a similar goal to
llvm#86618.


Yet another use case for this type filtering is whitelisting. We could
ignore _all_ types, save a few.

```bash
$ cat ignorelist.txt
[implicit-signed-integer-truncation]
type:*=no_sanitize # ignore literally all types
type:short=sanitize # except `short`

$ cat bar.c
// compile with -fsanitize=implicit-signed-integer-truncation
void bar(int toobig) {
  char a = toobig;  // not instrumented
  short b = toobig; // instrumented
}
```

### Other ways to accomplish the goal of sanitizer
allowlisting/whitelisting
* ignore list SSCL type support (this PR that you're reading)
* [my sanitize-allowlist
branch](llvm/llvm-project@main...JustinStitt:llvm-project:sanitize-allowlist)
- this just implements a sibling flag `-fsanitize-allowlist=`, removing
some of the double negative logic present with `skip`/`ignore` when
trying to whitelist something.
* [Glob
Negation](https://discourse.llvm.org/t/rfc-support-globpattern-add-operator-to-invert-matches/80683)
- Implement a negation operator to the GlobPattern class so the
ignorelist query can use them to simulate allowlisting


Please let me know which of the three options we like best. They are not
necessarily mutually exclusive.

Here's [another related
PR](llvm#86618) which implements a
`wraps` attribute. This can accomplish a similar goal to this PR but
requires in-source changes to codebases and also covers a wider variety
of integer definedness problems.

### CCs
@kees @vitalybuka @bwendling

---------

Signed-off-by: Justin Stitt <[email protected]>
PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants