JCC is designed to be a pure C11 (no dependencies) C11/C18/C23 compiler.
OS | AArch64 (Arm64) | x64 | RISC-V (32) |
---|---|---|---|
Ubuntu | |||
macOS |
If tests are failing, ignore it! Development is very active (and pushes sometimes break things)
Aims:
- To be a complete C11/C18/C23 compiler with full functionality (WIP)
- To use zero third-party dependencies or helper tools (no parser generators, assemblers, lexers, etc) other than system linker
- To follow best practices and have sensible compiler architecture
- Building the "smallest" C compiler is an explicit non-goal
- To be useful for learning about compilers
- Uses proper IRs rather than AST -> ASM
- Generates machine code, not assembly
- Builds SSA form and puts values in registers rather than spilling everything
- Builds object files and invokes system linker manually (rather than via a compiler or an assembler)
- Doesn't use hacks (mostly...)
No, it is text based
I just wanted to write a C compiler. It happens to be an easily buildable, easily runnable compiler that is still a grokkable size, It is probably too large to be considered a toy compiler, but the core architecture is much more accessible than the the shoggoth of Clang/GCC.
AArch64, x64, and RISC-V 32 are supported, although some of the x64 ABI is not yet fully implemented and RISC-V 32 64 bit integers are WIP. Working with RISC-V requires installing a RISC-V linker.
va_list
and variadic function implementation. Calling them works fine- Compound literals
- Linking on musl-based distros. This is relatively simple and should work soon
- C11-compliant C compiler
- POSIX shell
git
,curl
, orwget
for downloading sources- Nothing else!
- C11-compliant C compiler
- Bash, version >=3
- CMake
- A few other tools are used by
jcc.sh
commands to make for a more pleasant experience, but are not needed. These includebat
(for syntax-highlighting),fd
, andrg
To directly install jcc
for playing around with (tested on macOS & various Linux distros):
curl -sSL https://jcc.johnk.dev/install.sh | sh
The above URL is just a direct fetch of ./scripts/install.sh which you can verify by visiting it. It is NOT a redirect, it forwards the content itself. If you prefer, you can directly curl the script from raw.githubusercontent.com/john-h-k/jcc/refs/heads/main/scripts/install.sh
wget
can also be used, or you can clone the repository and run ./scripts/install.sh
if you somehow have git
but not curl
or wget
(???).
To install for development (which is realistically what you should do!):
- Ensure you have
bash
andcmake
installed - Fork & clone the repo (exercise left to reader)
- Run
./jcc.sh
for help
The jcc.sh
script can be used for common workflows. A key subset of the commands can be seen here (run ./jcc.sh
for all commands):
jcc.sh COMMAND
COMMANDS:
help Show help
run Build, then run JCC with provided arguments
debug Build, then run JCC under LLDB/GDB with provided arguments
test Run tests
test-all Run tests with all optimisation levels
format Format codebase
For the test script, run jcc.sh test help
.
- Arg parsing
- Preprocessor
- Frontend - Lexer + Parser
- Semantic analysis - Typecheck
- Intermediate Representations and passes
- All code located in the
ir
folder - IR representation structs and helper methods are in
ir/ir.h
andir/ir.c
- Pretty-printing functionality is in
ir/prettyprint.h
andir/prettyprint.c
- This also includes graph-building functionality with graphviz
- IR building
- This stage converts the AST into an SSA IR form
- It assumes the AST is entirely valid and well-typed
- Code is
ir/build.h
andir/build.c
- Lowering
- Firstly, global lowering is performed. This lowers certain operations that are lowered on all platforms
- E.g
br.switch
s are converted into a series of if-elses, andload.glb/store.glb
operations are transformed toaddr GLB + load.addr/store.addr
- E.g
- This converts the IR into the platform-native form
- Then, per-target lowering occurs
- For example, AArch64 has no
%
instr, sox = a % b
is converted toc = a / b; x = a - (c * b)
- For example, AArch64 has no
- The code for lowering is within the appropriate backend folders
- Firstly, global lowering is performed. This lowers certain operations that are lowered on all platforms
- Register allocation
- Simple LSRA, done seperately across floating-point & general-purpose registers
- Eliminate phi
- Splits critical edges and inserts moves to preserve semantics of phi ops
- All code located in the
- Code Generation
- Converts the IR into a list of 1:1 machine code instructions
- These are all target specific
- Currently codegen does too much - in the future I would like to move lots of its responsibilities (e.g prologue/epilogue) into IR passes
- Emitting
- Actually emits the instructions from code generation into memory
- Object file building
- Linking