Skip to content

Type checking error in clang AST

David Tarditi edited this page Sep 2, 2024 · 8 revisions

This page describes the resulting AST and the behavior of clang when a type checking error occurs. We have created this page after fixing the following issue: Type-checking error leads to compiler crash

Long story short, when clang encounters a type-checking error, it still tries to build the full AST to the best of its knowledge. Additionally, there is not an easy way to detect whether the type-checking error occurred only by looking at the resulting of dumping the AST with -ast-dump.

Type-Checking Error Example

Below c code will generate an error at s->arr[i], since s is a struct, not a pointer to a struct.

struct S {
  int arr [10];
  int data;
};
void passing_test_2(int i) {
  struct S s = { { 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }, 10 };
  s->arr[i] = 1;
}

Below is a part of the resulting AST generated by clang.

...
|-ArraySubscriptExpr 0x2839286d7d8 <col:3, col:11> 'int' lvalue
| |-ImplicitCastExpr 0x2839286d798 <col:3, col:6> 'int *' <ArrayToPointerDecay>
| | -MemberExpr 0x2839286d700 <col:3, col:6> 'int [10]' .arr 0x28390fcba68
| |   -ImplicitCastExpr 0x2839286d340 <col:3> 'struct S':'struct S' <LValueToRValue>
| |     -DeclRefExpr 0x2839286d318 <col:3> 'struct S':'struct S' lvalue Var 0x2839286cf20 's' 'struct S':'struct S'
...

Notice that the MemberExpr is not identifier as an lvalue, although it should be an lvalue.

Correct Implementation 1

Here is a correct c program of c snippet above.

struct S {
  int arr [10];
  int data;
};

void passing_test_2(int i) {
  struct S s = { { 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }, 10 };
  s.arr[i] = 1;
}

Here is a part of the generated AST. This snippet of AST is accessing s.arr

|-ArraySubscriptExpr 0x246f1163a88 <col:3, col:10> 'int' lvalue
| |-ImplicitCastExpr 0x246f1163a48 <col:3, col:5> 'int *' <ArrayToPointerDecay>
| | `-MemberExpr 0x246f11639b0 <col:3, col:5> 'int [10]' lvalue .arr 0x246ef8c9c48
| |   `-DeclRefExpr 0x246f1163988 <col:3> 'struct S':'struct S' lvalue Var 0x246f1163590 's' 'struct S':'struct S'

Correct Implementation 2

Or we can correctly write the program this way

struct S {
  int arr [10];
  int data;
};

void passing_test_2(int i) {
  struct S s = { { 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }, 10 };
  (&s)->arr[i] = 1;
}

And it will generate this AST

|-ArraySubscriptExpr 0x209dea7c158 <col:3, col:14> 'int' lvalue
| |-ImplicitCastExpr 0x209dea7c118 <col:3, col:9> 'int *' <ArrayToPointerDecay>
| | `-MemberExpr 0x209dea7c080 <col:3, col:9> 'int [10]' lvalue ->arr 0x209dea0f748
| |   `-ParenExpr 0x209dea7c060 <col:3, col:6> 'struct S *'
| |     `-UnaryOperator 0x209dea7c038 <col:4, col:5> 'struct S *' prefix '&'
| |       `-DeclRefExpr 0x209dea7bfb8 <col:5> 'struct S':'struct S' lvalue Var 0x209dea7bbc0 's' 'struct S':'struct S'

Both of these ASTs identify MemberExpr to be an lvalue.

Experimental Incorrect Code

As an experiment, I compiled this code:

struct S {
  int arr [10];
  int data;
};

void passing_test_2(int i) {
  struct S s = { { 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }, 10 };
  (&s).arr[i] = 1;
}

And the resulting AST looks like this

|-ArraySubscriptExpr 0x1fbf2933ec8 <col:3, col:13> 'int' lvalue
| |-ImplicitCastExpr 0x1fbf2933e88 <col:3, col:8> 'int *' <ArrayToPointerDecay>
| | `-MemberExpr 0x1fbf2933df0 <col:3, col:8> 'int [10]' lvalue ->arr 0x1fbf0fe7a28
| |   `-ParenExpr 0x1fbf2933a30 <col:3, col:6> 'struct S *'
| |     `-UnaryOperator 0x1fbf2933a08 <col:4, col:5> 'struct S *' prefix '&'
| |       `-DeclRefExpr 0x1fbf2933988 <col:5> 'struct S':'struct S' lvalue Var 0x1fbf2933590 's' 'struct S':'struct S'

Note that even though the compiler exited with an error, the constructed AST has MemberExpr to be an lvalue. It's also worth noting that that clang tried to fix the error in MemberExpr (->arr).

Investigation

I went through clang/lib/Sema/SemaExprMember.cpp. It looks like clang decides to keep going even when it finds a syntax error. The developer note comments that because the member reference will be empty, they will be able to identify the error later and deal with it (line 1335):

Returning valid-but-null is how we indicate to the caller that the lookup result was filled in. If typo correction was attempted and failed, the lookup result will have been cleared--that combined with the valid-but-null ExprResult will trigger the appropriate diagnostics.