- The details of a language make the difference between a reliable and an error-prone one.
- In Summer 1961, incorrect precision in the orbital trajectory calculator program at NASA happened due to Fortran's feature. Blank characters are not significant and can even occur in the middle of an identifier (to help cardpunch walloppers and readability of programs)
DO 10 I=1.10 -> DO10I = 1.10
- switch's
default
can appear anywhere in the list cases and any form of statements are permittedswitch(i) { case 5 + 3: do_again: case 2: printf("I loop unremittingly\n"); goto do_again; defau1t: i++; // typo }
- default fall through on switches is a design defect in C
- adjacent string literals are concatenated into one which leads to one potential issue
char names[] = { "luffy", "zoro" // no comma! "sanji", "nami", }
- too much default visibility
- many symbols are "overloadded" - given different meanings when used in different contexts
for example
void
in return type, no function parameter and a generic pointer sizeof
is the operator not a function callsizeof(int) // use for type, has to be enclosed in parentheses sizeof * p // p is int * // use for variable, not require
- some of the operators have the wrong precedence like ==/!= higher than bitwise. Long story short, early C has no separate operators for & and &&, & is interpreted as && when boolean is expected. Later, && was introduced and Dennis was afraid to change the precedence due to backward compatibility. In retrospect, he said it would be better to just change it
- if there is more than one possibility for the next token, the compiler will prefer the longest sequence of characters
z = y+++x; -> z = y++ + x;
- C philosophy that the declaration of an object should look like its use
int *p[3]; *p[i] // usage --- char (*j)[20]; j = (char (*)[20])malloc(20); // have to keep redundant parentheses around the asterisk
- function arguments might not be pushed into the stack, they can be in registers for speed when possible
- when assigning struct, struct elements are treated as first-class
struct s { int a[100]; }
- different between
typedef
and#define
- you can extend a macro typename with other type specifiers, but not typedef
#define peach int unsigned peach i; // works --- typedef int banana unsigned banana i; // no
- typedef provides the type for every declarator in a declaration
#define int_ptr int * int_ptr chalk, cheese; // -> int * chalk, cheese; <- only chalk is the int pointer --- typedef char * char_ptr char_ptr x, y // both x, y are char pointers
- there are multiple namespaces in C (everything within a namespace must be unique)
- label names
- tags (one namespace for all structs, enums, and unions)
- member names (each struct or union has its namespace)
- everything else
- definition occurs in only one place while declaration occurs multiple times
- the main difference between pointer and array is addressed vs content of address (it is much clear when looking at the assembly version of c)
so when the scenario like below
char a[] = "hello"; a[i]; 1. a doesn't exist, when referring to a, it is replaced with the first element address, say 0x1000 2. get content from address (0x1000 + i) ----- char *a = "hello"; a[i]; 1. a is memory address, in x86 it is 4-byte, say 0x1000 2. get content from address 0x1000, say 0x5000 3. get content from address (0x5000 + i)
# file1.c char a[] = "hello"; ------- # file2.c extern char *a; doing a[0] -> 1. content of address a, in x86, it is "hell" (0x6C6C6568) 2. get content of 0x6C6C6568 -> might corrupt your program
- a pointer definition does not allocate space for what is pointed at, only for pointer
char *p = "hello"; // work int *i = 10; // work, in this case 10 is memory address float *f = 3.14; // doesn't work, since 3.14 is value
- Benefits of dynamic linking
- is smaller than its sl counterpart (avoid coping library into executable)
- when linking to a particular library share a single copy of the library at runtime
- permits easy versioning of libraries, new libraries can be shipped -> old program can get the benefit without being relinked
- allows users to select at runtime which library to execute against (for example one for speed, one for memory efficiency, or containing debugging info)
- five special secrets of linking with libraries
- dynamic libraries are called "libsomething.so", static libraries are called "libsomething.a"
- you tell the compiler to link with, for example, "libthread.so" by giving the option -lthread
- the compiler expects to find the libraries in certain directories (for example -Lpathname -Rpathname)
- identify your libraries by looking at the headers files you have used (sometimes, you have to use tools like nm to manually search for a needed symbol)
- symbol from static libraries are extracted when needed (looking for undefined symbols) by the linker, while all library symbols go to the virtual address space for dynamic libraries
✍️ in static linking, if there is no undefined, so nothing will be extracted -> you have to put like this
gcc main.c -lm
- interposing is the practice of supplanting a library function with user-written function of the same name, usually for debugging or performance reasons
- array of type parameters are coverted to pointer of type by the compiler, other cases, they are as they are defined (while pointers are always pointers)
the reason for c to treat array parameters as pointers is efficiency (you don't want to copy array when passing to a function). Other data arguments are passed by value except arrays and functions
my_function(int *a) {} my_function(int a[]) {} my_function(int a[100]) {} are the same
- an array reference
a[i]
is always rewritten to*(a + 1)
by the compilera[6] == 6[a] // true <- *(a + 6) == *(6 + a)
- array names are not modifiable l-values
int p[] = {1, 2}; p = 0; // doesn't work int *c; c = 0; // work ---- void demo(int a[]) { a = 0; // work <- compiler converts `int a[]` to `int *a` }
- multidimentional array is a single block of memory while an array of array, each of which can be of different lengths and occupy their memory block
C only supports array of array
int carrot[10][20]; // carrot is a 10-element array, each element is 20-int array carrot[i][j] == *(*(carrot + i) + j) // true carrot + i == (char *)carrot + i * 20 * 4 // carrot == int (*)[20]
- Iliffe vector is a data structure used to implement n-dimensional arrays in a one-dimensional array
int *box[10];
- when looking at
squash[i][j]
, you cannot tell whether it is declared asint squash[10][20]; vs int (*squash)[20]; vs int **squash; vs int *squash[20];
- array name is written as a pointer argument isn't recursive
char c[8][10] -> char (*c)[10] char *c[15] -> char **c char (*c)[10] -> char (*c)[10] char **c -> char **c
- no way to pass general multidimensional array to a function, you could either use one-dimension array (convert two into one using
arr[row_size * i + j]
) or rewrite the matrix to Iliffe vectorvoid do_something(int a[][3][5]) {} --- int a[100][3][5]; do_something(a); // work int b[2][3][5]; do_something(b); // work --- int c[5][3][3]; do_something(c); // not compile int d[2][4][5]; do_something(d); // not compile
- library calls are part of the language or application, and system calls are part of the operating system
long double
is 80-bit extended precision on x86 processors -> occupy 96 bitslong double a = 3.14, b = a; // sizeof(a) == 16UL a == b; // true memcmp(&a, &b, sizeof(a)); // false because of uninitialized padding bytes
- why
calloc
exists?calloc
checks for overflow and errors out if the multiplication cannot fit into 32-bit or 64-bit integer (depend on how os/kernel is implemented)
malloc(INTPTR_MAX * INTPTR_MAX); // work calloc(INTPTR_MAX, INTPTR_MAX); // error
- when an operating system hands out memory to a process (depend on how os/kernel is implemented), it always zeros it out first (for security reasons).
- for large buffer, it probably comes from os ->
calloc
cheats by skipping zeroing out. - for small buffer,
calloc
==malloc
+memset
- for large buffer, it probably comes from os ->
- when handing 1GB of memory using, kernel probably does the trick that only mapping/zeroing out the first block 4KB and mark the rest as copy-on-write. Later when writing that rest, the kernel does the job. with
malloc
+memset
, we do the mapping/zeroing out upfront whilecalloc
we could do it later
- some compilers permit multiple characters in a character constant, the actual value is implementation-defined
char str[] = 'yes'; // valid
- The Clockwise/Spiral Rule to parse C declaration
- only the four operators
&&
,||
,?:
and,
specify an order of evaluation, others evaluate their operands in undefined order - better to declare a variable as
unsigned
when we expect it to non-negative than depending on our implementation-defined like right-shift, division ... - a definition is the special kind of declaration
- every declaration of an
enum
ortypedef
is a definition - for function, a declaration that includes body is a function definition
- for objects, declaration that allocates storage (not
extern
) is a definition
extern int n; // declaration int n; // declaration int n = 10; // definition
- for structs and unions, declaration that specify list of members is a definition
- every declaration of an
- C rules and recommendation
- adjacent strings are concatenated one by one
char x[] = "hello" " world"; // x == "hello world" char y[] = "\x12" "3"; // y == "\0223", not "\x123" // "\x12" "3" are two characters while "\x123" is one multibyte character
- depend on environment, C can be classified as runtime system or not
- freestanding: not
- hosted environment: standard library (gnu
libc
, windowmsvcrt.dll
) andcrt0.o